What plain-to-Markdown realistically does
Plain-to-Markdown is a normalization conversion. The input is plain text that may or may not contain Markdown-like syntax fragments; the output is Markdown that, where the input already looks like Markdown, parses correctly, and where the input is purely plain prose, passes through with whitespace and line-ending cleanup.
The honest framing matters because the name of the route can imply something the engine does not do. The converter does not invent heading hierarchy from prose. It does not auto-detect which line is a section title and which is a paragraph. It does not create links from URL-shaped substrings. It does not turn dash-prefixed lines into list items if those lines do not already match Markdown list syntax. The detection is signal-based, not heuristic.
What the engine does is straightforward: it normalizes whitespace and line endings, strips zero-width and non-printable characters, collapses excessive blank lines, and routes the result through the right parser. If the input contains Markdown signals (headings, lists, fenced code, bold, italic, links, escapes, tables), the input is treated as Markdown and re-rendered as cleaner Markdown. If the input contains no Markdown signals, it is wrapped as a paragraph and emitted as-is.
This makes plain-to-Markdown more of a "Markdown-aware text cleanup" than a "structural reverse engineering" tool. That framing avoids the failure mode of users expecting it to understand intent and getting frustrated when it doesn't.
The engine's plain text path
When the input does not contain any of the Markdown signals (no # heading marker, no - or 1. list start, no ``` fenced code, no **bold**, no _italic_, no [text](url), no escape sequences, no |...|---|...| table), the detector classifies it as plain text. The converter then takes a single, content-preserving path:
- Run the normalizer on the input. NBSP becomes regular space; soft hyphens are removed; zero-width characters are stripped; trailing whitespace per line is dropped; runs of three or more consecutive newlines collapse to two; CRLF and CR line endings become LF.
- Wrap the entire normalized text as a single paragraph node in the AST.
- Render the paragraph back to Markdown.
The output is the input text with cleaner whitespace and consistent line endings, which is useful even when no structural conversion happens. If you paste content from a styled source — a word-processed document, a clipboard transfer from a rich-text editor, a chat client — the normalizer removes invisible artifacts that often cause downstream rendering issues. Other source formats (binary documents, spreadsheets, image-based files) are not in scope on this route — use the dedicated converter for that source format.
The output preserves paragraph breaks (blank lines between text blocks) but does not infer paragraph boundaries beyond what the input already shows. A wall of text without blank lines stays a wall of text. A text with double-newline-separated paragraphs stays paragraph-separated.
When the input already looks like Markdown
This is where the converter does more interesting work. Many users have text that is mostly plain prose but contains a few Markdown-shaped fragments — a bullet list pasted from another note, a heading line typed by hand with ## prefix, a fenced code block with ```. The detector picks up on these signals and routes the entire input through the Markdown parser.
The Markdown parser used internally is marked configured for GFM. It handles:
- Headings with
#through######markers - Setext-style headings underlined with
=or- - Bullet lists with
-,*, or+markers - Numbered lists with
1.,2., etc. - Fenced code blocks with
``` - Inline code with single backticks
- Bold with
**or__ - Italic with
*or_ - Strikethrough with
~~ - Links with
[text](url)and image syntax - Blockquotes with
>prefix - GFM pipe tables
- Horizontal rules with
---
When any of these signals appear in the input, the parser builds a structured AST and re-renders the content in canonical Markdown. The benefits show up as cleaner output: standard list markers, consistent emphasis style, normalized whitespace inside formatting markers.
A common case: pasting from a tool that uses inconsistent Markdown style. Some authors use * for bullets, others use -; some use **bold**, others use __bold__. Routing through the parser produces a single canonical style in the output, easier to maintain.
Another common case: leniency around spacing inside emphasis markers. The engine accepts ** text ** (with spaces inside the bolds) and produces **text** on output. This handles a common copy-paste artifact from chat interfaces and word-processed documents.
Before/after — three plain text styles
A pure prose paragraph with no formatting goes in:
The convert function takes a string and an output format and returns a result object. The function is synchronous and runs in the browser without external dependencies.
And comes out as the same text, normalized and wrapped, with cleaner whitespace if the source had any:
The convert function takes a string and an output format and returns a result object. The function is synchronous and runs in the browser without external dependencies.
A paragraph that happens to use Markdown signals goes in:
The **convert** function takes a string and an output format. See [the docs](https://example.com) for details. The function is synchronous.
And comes out as canonical Markdown:
The **convert** function takes a string and an output format. See [the docs](https://example.com) for details. The function is synchronous.
A multi-section text with headings and a list goes in:
# Setup
Install dependencies:
* Node 20+
* npm 10+
# Usage
Import the function and call it.
And comes out as canonical Markdown:
# Setup
Install dependencies:
- Node 20+
- npm 10+
# Usage
Import the function and call it.
Notice the bullet markers normalized from * to -, the headings preserved, and the spacing kept consistent.
Edge cases in heuristic detection
Mixed signals trip the detector toward Markdown. If your "plain" text contains a single ## heading marker by accident — perhaps from a comment or a quoted message — the detector classifies the entire input as Markdown and routes it through the Markdown parser. This usually produces correct output but can change rendering of incidental syntax. If you specifically want plain-text-only behavior, remove ambiguous markers before pasting.
Bare URLs do not auto-link. A line containing https://example.com stays as the literal text in the output. The Markdown parser used here does not auto-link bare URLs. If you want a clickable link, write the Markdown link syntax: [https://example.com](https://example.com) or angle-bracket form <https://example.com> on parsers that support it.
Headings are not inferred from prose. A line like Section title followed by content does not become ## Section title automatically. Plain text with structural intent that wasn't expressed in syntax stays plain. If your source has visual headings (centered text, ALL CAPS lines, double-newline-bounded titles), those visual cues are not parsed as headings — they pass through as paragraphs.
Lists are detected only by syntax. A text with manually-formatted lines like 1) First or - First. is not detected as a list — the parser looks for 1. (digit, period, space) or - (dash, space). The same applies to bullet variants: 1), (1), 1. without a space, all stay as paragraph text.
Indented blocks are sensitive to spaces. Markdown uses indentation to indicate code blocks (4 spaces) and nested list items. If your plain text has decorative indentation (paragraph indents from a Word doc), the parser may interpret the indented lines as code blocks, producing unexpected output. Run input through a "remove leading whitespace" pass before pasting if your source uses paragraph indentation.
Long single-line paragraphs preserved as-is. Some plain-text sources (legal documents, transcripts, machine-generated logs) have very long lines without word wrapping. The converter preserves these lines exactly; word-wrapping is a destination concern, not a Markdown concern.
Trailing whitespace and zero-width characters are normalized. This is the most useful transformation for plain text input — it removes invisible characters that came from a copy-paste from a styled source. The output is cleaner and more predictable for downstream rendering.
What manual structuring still needs to be done
The plain-to-Markdown conversion handles cleanup and Markdown-aware re-rendering, but it does not produce structural Markdown from unstructured prose. If your input is a document where structure should be added, plan for manual editing:
Add heading markers (#, ##, ###) at section boundaries. The converter cannot infer where they go.
Add list markers (- or 1. ) on lines that should be list items. Visual cues like indentation or initial caps don't translate.
Add link syntax for URLs that should be clickable. Bare URLs in plain text stay plain.
Add emphasis (**bold**, *italic*) where the source had typographic emphasis (italic font, weight differences in a Word doc). Plain text loses these cues entirely.
Reorganize content if the source flow needs editorial restructuring. The converter preserves order; reordering is editing.
For workflows that produce structure-rich output from unstructured input, language models are usually the right tool — describe the desired Markdown structure to an LLM and feed it the plain text. The plain-to-Markdown converter is a cleanup and normalization step in that pipeline, not a replacement for the structuring step.
Workflow for migrating notes
The most common workflow this route serves is moving content out of an unstructured note app or a word-processed document into a Markdown-based knowledge base (Obsidian, Logseq, Bear, Notion, GitHub wiki). The migration steps look like this:
- Export source as plain text or copy to clipboard.
- Paste into the converter at
vust.ai/markdown/plain-to-markdown. - Run the conversion. Output is normalized; existing Markdown signals are canonicalized.
- Save the output and review for structure gaps.
- Manually add headings, list markers, and links where the visual source had them but the plain text didn't preserve them.
- Save to your destination.
The web tool is browser-only and has no daily processing limit beyond the free-tier rate-limit. The Telegram bot at @vustMarkdownBot runs the same engine; send the plain text and request Markdown output. Bot conversions are billed against your crystal balance.
For images, attached files, or formats outside plain text — those are not in scope on this route. The dedicated converters under /markdown/* handle specific source formats; this route is for plain text only.
The result is honest: a Markdown-aware cleanup of plain text input, with canonical re-rendering when Markdown signals are present, and a cleaner version of the input when they aren't. It saves time on the cleanup step, but the structuring step is still yours.