Available in Englishvust.ai
vust

Markdown · HTML to MD

Convert HTML to Markdown

Paste any HTML — including Telegram-formatted text or rich-text from your editor — and get clean Markdown ready for Jekyll, Hugo, GitHub, or Notion. No signup, instant.

Free · No signup · Instant

Free, no signup. Up to 1 MB per request.

Instant conversionGFM-compatibleTelegram entities

HTML to Markdown examples

Paste real HTML in the left column, see the Markdown output on the right.

Telegram HTML

HTML

<b>Project</b> <i>Status:</i> <code>OK</code>

Markdown

**Project** _Status:_ `OK`

Rich text

HTML

<h2>Title</h2><ul><li>Item</li></ul>

Markdown

## Title - Item

Code-rich

HTML

<pre><code class="lang-js">const x = 1;</code></pre>

Markdown

```js const x = 1; ```

How HTML-to-Markdown conversion works

01

Paste HTML — Paste your HTML into the input field.

02

Convert — Click Convert — get Markdown instantly.

03

Copy — Copy the Markdown to clipboard or download as `.md`.

HTML edge cases we handle

Telegram entities

Custom emoji (`tg-emoji`), text-link entities, and code-block language attributes are detected and preserved — no other converter handles these correctly.

Code highlighting

`<code class="lang-X">` is converted to fenced blocks with language hint preserved (` ```X `).

GFM tables

Semantic HTML tables convert to GFM pipe-table syntax with header alignment kept.

HTML entities

Encoded entities like `&amp;`, `&lt;`, `&nbsp;` are decoded to their literal characters.

What HTML-to-Markdown actually converts

HTML-to-Markdown is a content-extraction conversion. The input is a fragment of HTML — copied from a web page, exported from a CMS, pulled from a clipboard rich-text paste — and the output is Markdown that preserves the readable content while dropping presentation. It is not a faithful round-trip; HTML and Markdown describe overlapping but not identical sets of structures, and any honest converter will tell you what it keeps and what it drops.

Our engine handles a deliberate, tightly-scoped subset of HTML. The set was chosen so the converter behaves predictably on the kinds of HTML people actually paste into it: article paragraphs, formatted spans, inline links, ordered and unordered lists, blockquotes, inline code, and code blocks. Tags outside this set are silently skipped — their text content survives, but the wrapping is stripped. That includes headings, tables, images, divs, spans, and most page-layout markup.

The reason for the narrow set is reliability. A web page contains hundreds of layout, scripting, and styling tags that mean nothing to Markdown. Trying to round-trip every nested div and class would either bloat the output with stray tags or invent Markdown syntax that does not exist in the spec. We chose to keep what the conversion can faithfully represent and drop the rest cleanly.

When you reach for this conversion

The recurring use cases are practical, not theoretical. The most common is moving a paragraph of formatted text out of a web app — a task description from a project tool, a comment from a forum, a help-article excerpt — and into a Markdown destination such as a documentation page, a wiki entry, an issue tracker, a chat message, or a knowledge base. The HTML carries the formatting; the Markdown destination needs the formatting in its own dialect.

The second case is rich-text email or content-management exports. CMS authors who paste from a WYSIWYG editor end up with HTML cluttered by inline styles, font tags, and class attributes. Routing that through HTML-to-Markdown produces clean Markdown that strips the editor noise.

The third case is preparing prompts for language models. LLMs accept Markdown-formatted prompts, and that formatting often improves response structure. If you have HTML source content (a help article, a transcript, a document fragment) and want to feed it to an LLM for summarization, rephrasing, or question answering, Markdown is a more efficient, lower-token representation.

A fourth, smaller case is content auditing. When you want to see only the content of an HTML fragment — what would survive in a plain-text grep, an accessibility scan, or a search index — Markdown is a stripped-down view that hides layout while preserving reading order.

How the parser sees your HTML

Our converter does not run a browser engine, an XML parser, or a full DOM tree. It scans the input with a tag-matching regex, builds an AST of allowed nodes, and renders the AST to Markdown. The advantages are speed and predictability — the disadvantages are real: malformed HTML, deeply nested structures, and tags that depend on browser-style auto-correction may produce output that looks subtly off.

The allowed tags are: p, br, strong, em, b, i, u, s, a, code, pre, ul, ol, li, and blockquote. Anything else is dropped at the tag boundary while text inside is kept. Disabling auto-correction means a malformed <strong> without a matching </strong> does not silently extend across paragraphs — the engine treats unclosed tags conservatively.

URL handling deserves a mention. The <a href="..."> link target is sanitized: only http://, https://, and mailto: schemes are kept. A javascript: URL, a data: URL, an ftp: URL, or a relative path is rejected — the link becomes plain text. This is a deliberate security boundary; the converter is designed to be safe to use on untrusted input.

The normalizer runs on the input by default. It strips zero-width characters, replaces non-breaking spaces with regular spaces, removes soft hyphens, normalizes line endings, and collapses runs of three or more consecutive newlines into two. These are non-stylistic cleanups — content does not change meaning, but the output stops carrying the artifacts of a copy-paste from a Word document or a styled web page.

Concrete before/after

A typical paragraph with a link and inline emphasis goes in:

<p>The <strong>quick brown fox</strong> jumps over the <a href="https://example.com">lazy dog</a>.</p>

And comes out as:

The **quick brown fox** jumps over the [lazy dog](https://example.com).

A bullet list with mixed inline formatting goes in:

<ul>
  <li>First item with <em>emphasis</em></li>
  <li>Second item with <code>inline code</code></li>
  <li>Third item with a <a href="https://example.com">link</a></li>
</ul>

And comes out as:

- First item with *emphasis*
- Second item with `inline code`
- Third item with a [link](https://example.com)

A blockquote with two short paragraphs goes in:

<blockquote>
  <p>The mind is everything.</p>
  <p>What you think, you become.</p>
</blockquote>

And comes out as:

> The mind is everything.
>
> What you think, you become.

These are clean cases. The conversion fidelity is high when the source HTML stays inside the allowed tag set and uses well-formed open/close pairs.

Edge cases that surprise people

Headings disappear. The engine does not include h1, h2, h3, h4, h5, or h6 in the allowed tag set. The text inside the heading survives, but the heading wrapper does not — <h2>Section title</h2> becomes the line Section title with no ## prefix. If your HTML source uses real heading tags, the rendered Markdown will not show document hierarchy. Workaround: post-process the output to add ## markers on the lines that used to be headings, or restructure the source HTML before pasting.

Tables drop entirely. HTML tables are not in the allowed tag set. A <table><tr><td>...</td></tr></table> element produces no Markdown table syntax — only the cell text, joined by spaces. If you need tabular data converted, the right tool is the dedicated CSV-to-Markdown converter, not HTML-to-Markdown. For data already in HTML table form, copy the source as TSV from the rendered web page and paste through CSV-to-Markdown with the Tab delimiter.

Images are stripped. The <img> tag is not allowed. An image in your HTML source produces no Markdown image syntax (![alt](src)) and no fallback text — the image element is removed. If image references matter, you need to manually add Markdown image syntax after the conversion. There is no scope for image hosting or asset upload in this converter.

Inline style attributes do nothing. Tags carry their class and style attributes through the parser, but the engine ignores everything except href on <a> tags. A <span style="color: red">important</span> produces just important — no Markdown for color (Markdown doesn't have one), no preserved style attribute, no fallback marker.

<br> becomes a soft line break. Inside a paragraph, <br> produces a newline. This works for verse, addresses, and inline line breaks. It does not produce a paragraph break — for paragraph breaks, use <p> start and end tags in the source.

Nested code blocks have a flatness issue. A <pre><code>...</code></pre> (the standard pattern for fenced code) renders as a fenced code block. A <pre> without inner <code> also works. But a <code> without an enclosing <pre> is treated as inline code regardless of length, so a long code snippet inside <code> only (no <pre>) loses its block formatting and gets rendered as a single backtick-wrapped string.

Whitespace inside <pre> is preserved literally. Indentation, tabs, multiple spaces — all retained. This is correct for code blocks but can produce odd-looking output if the source HTML had non-code content wrapped in <pre> for visual reasons.

Link text equal to URL gets unwrapped. When the link text matches the URL (<a href="https://example.com">https://example.com</a>), the engine outputs just the URL — not [https://example.com](https://example.com). This avoids visual redundancy and produces cleaner Markdown for link auto-detection.

What gets dropped on purpose

A short list of content the converter does not preserve:

  • Headings (h1h6) as Markdown headings
  • Tables, table rows, and table cells
  • Images, including alt text in some cases
  • Iframes, embeds, scripts, styles
  • Form elements (input, select, textarea, button)
  • Custom elements and any tag outside the allowed list
  • HTML comments (<!-- ... -->)
  • Inline style, class, id attributes
  • Non-safe URL schemes (javascript:, data:, ftp:, relative paths) — the link unwraps to plain text

This is intentional. Markdown does not natively express most of these, and the converter does not invent syntax that does not exist in the spec.

If your HTML has content that needs a different conversion direction (a PDF document, a Word file, a CSV table), use the corresponding dedicated converter — those formats are not in scope on this route.

Workflow on web vs Telegram

The web tool at vust.ai/markdown/html-to-markdown accepts HTML pasted into the textarea, runs the conversion in your browser session, and shows the Markdown result. There is no upload required and no file-format support on this route — paste HTML text only. The free web tier covers daily conversions; for higher volume, the Telegram bot at @vustMarkdownBot runs the same engine without a daily cap inside your purchased crystal balance.

The Telegram path accepts the same HTML input as a regular message. The bot's reply is the converted Markdown text, ready to copy. Conversions are billed per send according to current bot pricing; web is rate-limited but free.

When working with HTML pasted from a web page, the cleanest results come from copying the article body or content fragment specifically, not the whole page source. Browsers' "Copy as HTML" extension or right-click "Inspect → Copy outerHTML" produces cleaner input than trying to run the converter on a full page download.

When the source is a CMS export, look for a "Save as HTML" or "Export as HTML" option. If only XML or proprietary export is available, that input is out of scope; convert via your CMS's HTML preview first.

Coverage gaps you should plan for

For longer documents, especially ones with section headings, your post-conversion workflow should include adding heading markers (#, ##, ###) at the right positions. The converter cannot infer heading levels from HTML structure when those tags are dropped.

For documents with tables, plan to extract the table separately. Either copy the table cells as TSV and run through the CSV-to-Markdown converter, or manually rewrite the table in Markdown pipe-syntax.

For images, plan to either keep them as separate files referenced from your destination, or manually add Markdown image syntax with the URL of where the image lives. The HTML-to-Markdown converter is text-only; images don't survive the conversion.

The result is honest: a fast, predictable, narrow converter that does well exactly what its scope describes, and tells you upfront what it doesn't handle. For most "paste this article into my Markdown notes" tasks, that is enough.

Frequently asked questions

Process bigger files in @vustMarkdownBot

500-character free conversions in chat — pay-as-you-go for longer text.

Open Telegram bot
    HTML to Markdown Converter — VUST