Available in Englishvust.ai
vust

Markdown · DOCX to MD

Convert Word DOCX to Markdown

Drop a Word document — get Markdown with headings, lists, and inline styles preserved. Browser-based, no upload to our servers.

Free · No signup · Instant

Max 10 MB · Processed in your browser · Nothing uploaded to our servers.

Free, no signup. Up to 1 MB per request.

Browser-onlyPreserves structureFast

This tool handles

  • .docx files up to 10 MB
  • Headings, lists, bold/italic
  • Links and inline code
  • Basic GFM tables

Not in scope

  • Legacy .doc format
  • Merged table cells
  • Equation Editor blocks
  • SmartArt and text boxes

Try it live with the widget above — paste and see the output instantly. For the items under “Not in scope,” the migration guide below covers workarounds and when to use a different tool.

DOCX to Markdown examples

Paste real DOCX in the left column, see the Markdown output on the right.

Article with H2/H3

DOCX

Word document with headings

Markdown

## Introduction Text… ### Section 1 More text…

Bulleted list

DOCX

Word bullet list

Markdown

- Item one - Item two - Item three

Inline formatting

DOCX

Word text with **bold** and *italic*

Markdown

Word text with **bold** and *italic*

How DOCX-to-Markdown conversion works

01

Drop DOCX — Drag & drop or pick a .docx file (up to 10 MB). Processing runs in your browser.

02

Convert — Click Convert — headings, lists, bold/italic, links, and tables are preserved.

03

Copy — Copy the Markdown or download as `.md`.

DOCX edge cases we handle

Only .docx (not .doc)

Legacy .doc files are not supported. Open in Word and Save As .docx first.

Images embedded

Images can be extracted as base64 data URIs — note output size may grow. Plain-image strip mode available as fallback.

Tables with merged cells

Merged cells are simplified to their top-left value; the structure collapses to a standard GFM table.

Comments and tracked changes

Comments and revisions are ignored by default. We emit a warning if the document contained tracked changes.

What is a DOCX-to-Markdown conversion?

Microsoft Word's .docx format is a ZIP archive containing XML files — word/document.xml for the body, word/styles.xml for formatting definitions, word/_rels/ for linkage, plus optional media and settings. That XML carries full structural fidelity: heading levels, paragraph styles, lists, tables, tracked changes, comments, embedded images, fonts. Converting a DOCX to Markdown means reading that XML and mapping it onto Markdown primitives. Headings map cleanly. Lists map cleanly. Bold, italic, links, inline code map cleanly. Tables map to GFM pipe syntax with some loss. Track changes, complex footnotes, embedded Excel objects, equation-editor blocks — these are lossy or skipped.

Our converter uses mammoth.js, the de-facto open-source DOCX-to-Markdown library, running entirely in your browser. You drop a .docx file, mammoth parses the XML, produces structured Markdown, and you copy the output. The file never leaves your machine: extraction is client-side, and the resulting Markdown is passed to our server only for whitespace normalization.

Why convert DOCX to Markdown?

The DOCX-to-Markdown journey is one of the most common document conversions in modern workflows. Writers draft in Word, but target systems (blogs, docs sites, static generators, note apps, developer documentation) want Markdown. The handoff is a bottleneck: copy-paste from Word into a web form strips formatting and introduces junk HTML; saving as .html produces bloated output full of <span> wrappers and embedded styles. Mammoth-based conversion produces clean semantic Markdown — headings, lists, bold, italic, links, tables — without the noise.

Specific scenarios driving this conversion: authors submitting blog posts to a Jekyll/Hugo site want their Word drafts as MD before the CI pipeline publishes; technical writers migrating from SharePoint / Confluence internal wikis (which accept DOCX upload) to modern Git-based docs (Docusaurus, GitBook, MkDocs); researchers converting thesis chapters written in Word into notes for Obsidian or Logseq; teachers moving lecture notes from .docx archives to Markdown-based learning management systems; journalists archiving long-form drafts in git repositories instead of Dropbox folders.

Also: LLM preprocessing. Modern chat models accept Markdown better than DOCX. If you want to summarize a Word document with ChatGPT or Claude, the cleanest path is DOCX → Markdown → LLM. Sending the .docx directly forces the model to parse XML; sending extracted plain text loses headings and structure. Markdown is the middle ground.

Manual approach

Word → Markdown by hand is tedious for anything beyond a page. In Word, you select the content, copy, and paste into a Markdown editor. What you get depends on the editor: Obsidian converts some formatting; VS Code pastes raw text; browser-based editors keep HTML (usually messy). Then you walk the text: # in front of headings, - in front of list items, ** around bold, * around italic, [text](url) around links. Tables are the worst — you retype each one as GFM pipe syntax.

For a 10-page Word document, expect 30-60 minutes of hand-conversion. Common mistakes: missing headings where you forgot the #, inconsistent list nesting, stray styles that the paste brought through as HTML wrappers, broken tables. Ugly but tractable for a single document. For a library of 50 Word docs, impractical.

Automated approach (our tool)

Our converter uses mammoth.convertToMarkdown() from the mammoth npm package (maintained by Michael Williamson since 2013, ~3k GitHub stars, broad adoption). Mammoth's philosophy is "semantic conversion": map Word styles to Markdown features, ignore visual formatting that Markdown can't express.

What mammoth covers well:

  • Heading 1-6 styles → # through ######
  • Paragraph styles (Normal, Quote, Code) → plain text, blockquotes, code blocks
  • Bold, italic, underline (as **bold**, *italic*, underline→bold fallback)
  • Hyperlinks → [text](url)
  • Ordered and unordered lists, including nested levels
  • Tables → GFM pipe table syntax (cell-by-cell, no merged cells)
  • Inline code (if styled as Code Char in Word)
  • Images → base64 data URIs (can be disabled with a conversion option; by default we include them)

Our integration:

  1. Client-side extraction via mammoth.convertToMarkdown({arrayBuffer}) — the .docx ArrayBuffer is passed to mammoth, which returns {value: markdown, messages: [...]}.
  2. 10 MB file size cap (Word documents with many embedded images can swell; for most prose docs this is ample).
  3. 15-second parse timeout.
  4. Messages from mammoth (warnings about unsupported elements) are surfaced in the output panel as warnings.
  5. Server-side post-processing: whitespace normalization and paragraph-break consolidation.

Output quality is consistently high for typical business and academic Word documents. For heavily-formatted documents with complex tables, merged cells, text boxes, or SmartArt, expect partial loss.

Common gotchas

Old .doc format not supported. Mammoth parses .docx only — the modern Open XML format introduced in Office 2007. Legacy .doc files (Office 97-2003 binary format) are not supported. To convert a .doc, open it in Word or LibreOffice and Save As .docx first. Then run our tool on the .docx.

Images become base64 data URIs by default. Word documents with embedded photos or diagrams produce Markdown with inline ![alt](data:image/png;base64,...) that can make the output very large. A 5 MB Word doc with 20 photos might produce a 30 MB Markdown file. Two mitigations: (a) accept the bloat if your target system handles data URIs (most do); (b) strip images from the Markdown output (search-and-replace !\[.*?\]\(data:.*?\)) and re-embed by linking to separately-saved image files.

Tables with merged cells collapse. Mammoth supports basic GFM tables: row-per-row, cell-per-cell. Merged cells (in Word: Select cells → right-click → Merge Cells) become simplified — the merged region's value appears in the top-left cell; the other cells become empty. Headers of tables are recognized if you used "Header Row" style; otherwise the first row is just data.

Tracked changes and comments are silent. If your Word document has Track Changes enabled with unaccepted revisions, the output depends on how Word serialized them. By default, mammoth shows the "final" version (accepted state), ignoring inline markup. Comments (right-margin speech bubbles in Word) are not included in the output. Accept or reject all changes in Word before converting, and resolve or delete comments — otherwise you lose that metadata silently.

Text boxes and SmartArt don't convert. Word's "Text Box" elements (floating text) and "SmartArt" diagrams (org charts, process flows) are ignored. They appear as blank spots in the output. If critical content lives in text boxes, convert them to inline paragraphs in Word first. For SmartArt, export separately as images.

Equations are lossy. Word's equation editor (introduced 2007) stores equations as OMML. Mammoth doesn't convert OMML to LaTeX — equations appear as placeholder text or are skipped. For academic documents with equations, either hand-reconstruct after conversion, or use Pandoc (which has an OMML → LaTeX path).

Footnotes move inline. Word footnotes are flattened — the footnote number marker stays inline and the footnote text appears at the end. This usually works but can be awkward for dense-footnote academic writing.

Heading levels depend on correct style usage. Our converter relies on Heading 1 / Heading 2 / etc. styles to produce # / ##. If the author manually bolded and enlarged text without applying a heading style, those "visual headings" become plain bold paragraphs in the output. Fix in Word by applying proper heading styles before conversion.

When to use a different tool instead

For .doc files: convert to .docx first in Word/LibreOffice, then use our tool.

For full-fidelity academic/technical conversion: Pandoc with -f docx -t gfm handles more edge cases — OMML equations, complex footnotes, embedded Excel objects. Pandoc requires a local install but gives higher quality for complex documents.

For bulk conversion: Pandoc via command line in a loop over your .docx files. Our tool is one-file-at-a-time in the browser; script the CLI path for libraries.

For Word documents you want to summarize, not convert: use the text export path (File → Export → Plain Text or .txt) and feed to an LLM directly. Summarization doesn't need full structure.

For confidential documents: our tool is browser-only — the file doesn't upload. This matters when your DOCX contains client info, unreleased research, legal drafts. Our approach matches a local tool in privacy; no other web tool with this privacy posture exists.

Migration workflow

A practical workflow for moving DOCX content into Markdown-based systems:

  1. Accept all Track Changes, resolve comments. In Word: Review → Accept → Accept All Changes; Review → Comments → delete all. Our tool won't include either, so finalize the document first.
  2. Apply heading styles consistently. Walk the document, make sure Heading 1 is used for top-level sections, Heading 2 for subsections, etc. If the author used manual bolding for headings, apply the proper styles — this is the only way to get #/## markers in the output.
  3. Decide image handling. If images are essential, let them come through as data URIs (our default). If you'd rather handle images separately, set a post-conversion search-and-replace to strip them; extract images from the .docx manually (rename to .zip, open, pull from word/media/).
  4. Convert via our tool. Drop the file, click Convert. Review the output for: heading levels, list nesting, table structure, stray formatting messages in the warnings panel.
  5. Clean up warnings. Mammoth reports unsupported elements ("Unrecognized paragraph style: X"). Review these and decide: ignore (unusual style you don't care about) or fix in source (apply a standard style and re-convert).
  6. Handle complex tables by hand. If your Word document has merged cells, multi-row headers, or nested tables, hand-reconstruct in GFM pipe syntax after conversion.
  7. Add frontmatter and canonical URL. Target systems (Jekyll, Hugo, Docusaurus) expect YAML frontmatter. Add title, date, author, tags, and layout hints.
  8. Preview in target platform. Open the converted Markdown in your destination: GitHub, your static site local preview, or your note-taking app. Common issues: table rendering (check GFM compatibility), image URIs too large (strip or split), heading nesting (ensure ## doesn't skip to #### without ### in between).

For a 20-document DOCX library with standard formatting, expect 2-3 hours end-to-end. Heavily-formatted technical documents (thesis chapters, regulatory submissions) take longer; plan per-document review time.

Frequently asked questions

Process bigger files in @vustMarkdownBot

500-character free conversions in chat — pay-as-you-go for longer text.

Open Telegram bot
    Word DOCX to Markdown Converter — VUST