Telegram entities are not Markdown
Telegram messages with formatting do not arrive as Markdown source. Telegram has its own way of representing message formatting: a separate array of "entities" alongside the plain message text. Each entity has a type (bold, italic, code, text_link, pre, etc.), a UTF-16 offset and length pointing into the message text, and optional metadata (a URL for text_link, a language for pre). The entities are positional metadata; the text itself stays unmarked.
This design lets Telegram clients render formatting natively without parsing inline syntax. It is also why copying a styled message out of Telegram into a plain-text destination preserves only the text and loses all the formatting — you only have access to the visible text, not the entity array.
The Telegram-to-Markdown converter solves the gap. Given the message text plus the entity array, it reconstructs the formatting as Markdown syntax: bold entity wraps text in **, italic wraps in *, text_link produces [text](url), and so on. The output is portable Markdown that lives outside Telegram — usable in notes, documentation, blog posts, knowledge bases.
This route is the conversion exclusive to Telegram-aware workflows. Tools that ingest Telegram message exports, bot integrations that move content from Telegram to other systems, and personal knowledge bases that capture chat content all need the entity-to-Markdown step.
What an entity-formatted message contains
A Telegram message has two sides:
The text: the literal characters the message contains, in UTF-16 encoding. No special markers, no inline syntax — just the characters as displayed. Bold or italic text is not marked in the text itself.
The entities: an array of objects, each describing a formatting region. Each entity has:
type: one ofbold,italic,underline,strikethrough,spoiler,code(inline),pre(block),text_link,url,mention,hashtag,cashtag,bot_command,email,phone_number,mention_name,custom_emoji,blockquote,expandable_blockquoteoffset: starting UTF-16 character index in the textlength: length of the formatting region in UTF-16 charactersurl: target URL (only ontext_linkentities)language: code language hint (only onpreentities)- Other fields for specific types (user reference for
mention_name, custom emoji ID forcustom_emoji)
The entities can overlap or nest. A single character in the text might be inside a bold entity and an italic entity at the same time, producing bold-italic rendering. The converter handles the overlap by stacking the formatting markers in the order the entities are sorted.
Our converter handles the entity types most commonly seen in chat content: bold, italic, underline, strikethrough, code, pre, text_link, url, blockquote. Other types either pass through as plain text (their formatting effect doesn't translate to Markdown — mention, hashtag, bot_command) or are not represented in the output (spoiler has no Markdown equivalent; custom_emoji is bot-private).
How conversion preserves your text
The conversion is structural. The engine reads the entity array, builds an AST that represents each entity's effect on the text, and renders the AST as Markdown. The text content is preserved exactly — no characters added or removed; only the formatting markers from Markdown's syntax are inserted at the entity boundaries.
UTF-16 offset arithmetic is the technical heart of the engine. JavaScript strings are UTF-16, but emoji and many CJK characters are surrogate pairs that span two UTF-16 code units while being a single visible character. Telegram's offsets are in UTF-16 units. The converter respects the encoding; offsets and lengths reach the right characters in the original text, and the Markdown markers wrap the right ranges.
Sorting matters too. Entities can arrive in any order; overlapping entities need consistent rendering. The converter sorts by start offset and length, longest-first when offsets match, so nested formatting renders deterministically.
URL safety is enforced on text_link and url entities. Only http://, https://, and mailto: schemes are kept. A text_link pointing to a javascript: or data: URL drops the link wrapping; the text content is preserved as plain text. This is the same security boundary as the HTML-to-Markdown converter — the engine is safe to use on untrusted message content.
Before/after — entity arrays vs Markdown source
A short message with bold and italic goes in as text plus entities:
{
"text": "The quick brown fox jumps over the lazy dog.",
"entities": [
{ "type": "bold", "offset": 4, "length": 15 },
{ "type": "italic", "offset": 35, "length": 8 }
]
}
The text from offset 4 to 19 is "quick brown fox" (bold), and from 35 to 43 is "lazy dog" (italic). Output:
The **quick brown fox** jumps over the *lazy dog*.
A message with a text link goes in:
{
"text": "Read more here.",
"entities": [
{ "type": "text_link", "offset": 10, "length": 4, "url": "https://example.com" }
]
}
Output:
Read more [here](https://example.com).
A message with an inline code span and a code block goes in:
{
"text": "Run `npm install` first, then:\n\nnpm run build\n\nThis should produce dist/.",
"entities": [
{ "type": "code", "offset": 4, "length": 12 },
{ "type": "pre", "offset": 31, "length": 14 }
]
}
Output:
Run `npm install` first, then:
npm run build
This should produce dist/.
These three patterns — emphasis, links, code — cover the bulk of conversational formatting. The converter handles them with high fidelity.
URL handling and security boundary
The text_link entity is where security matters. A user or bot can craft a text_link with a URL of any scheme. Our converter validates against a whitelist (http, https, mailto) and drops the link wrapping for any other scheme. Examples:
text_linkwithhttps://example.com→[text](https://example.com)text_linkwithmailto:user@example.com→[text](mailto:user@example.com)text_linkwithjavascript:alert(1)→ plain texttext(link dropped)text_linkwithdata:text/html,<script>...→ plain text (link dropped)text_linkwithfile:///etc/passwd→ plain text (link dropped)text_linkwith relativepath/to/page→ plain text (link dropped, no relative-URL semantics in standalone Markdown)
The same validation runs on url entities (auto-detected URLs in the message text). If a Telegram client formatted a string as a URL but the URL is unsafe, the entity wrapping is dropped. The original text always survives.
This is a deliberate boundary. The converter is intended to be safe even when processing message content from untrusted sources — public channels, group chats with strangers, automated forwarders. Markdown emitted by the converter does not surface unsafe URLs as clickable links.
Telegram features that don't survive
A handful of Telegram-specific formatting types do not have Markdown equivalents:
Spoilers are a Telegram visual feature where content is hidden until tapped. There is no spoiler syntax in standard Markdown. The converter renders spoiler text as plain text without any wrapper. Some Markdown extensions (Discord's ||spoiler||) do exist but are not portable; the safe choice is plain text.
Custom emoji are Telegram-specific stickers that render inline as message content. They have no Unicode codepoint or Markdown representation. The converter renders them as the underlying placeholder text (typically a fallback emoji that the client supplied) — but the rich custom-emoji effect does not translate.
Mentions, hashtags, cashtags, bot commands stay as their literal text. @username stays as @username; #topic stays as #topic. The Markdown output does not include any link to a profile or hashtag search — those are Telegram-internal references.
Phone numbers, emails that Telegram auto-formats as entities pass through as their literal text. The converter does not produce mailto: or tel: Markdown links from these unless an explicit text_link entity references them.
Mention by name (mention_name with a user reference) renders as the literal text. There is no portable way to express "this is a reference to user with internal ID X" in Markdown; the converter loses the reference and keeps only the visible text.
Quote (blockquote) and expandable blockquote are now first-class entities in modern Telegram. The converter renders both as standard Markdown blockquotes with > prefix. The expandable behavior — click to expand long quotes — is not preserved in Markdown output, but the quoted content is intact.
When you need this conversion
The recurring use cases concentrate around a few patterns:
Bot integrations exporting Telegram content. A custom bot saves chat messages to a knowledge base, a CRM, or a documentation system. The receiving system stores Markdown. The conversion runs at the bot ingestion step.
Manual archiving of important chats. Personal use case: copying a long, well-formatted chat thread out of Telegram into a permanent note. The conversion preserves the formatting that would otherwise be lost.
Documentation snippets shared via Telegram. Tech communities often share code snippets and step-by-step instructions in chat. When the content moves out of chat into a doc page or a wiki, the entity-to-Markdown conversion captures the formatting that the chat had.
Customer support transcript export. When a support workflow uses Telegram channels and the support team needs to archive transcripts, the Markdown export is more portable than raw entities or plain text.
Migration off Telegram. When a Telegram-first community moves to a different platform (Discord, Slack, a forum), preserving message formatting in the migration is the difference between readable and unreadable archives.
Workflow on web vs Telegram
The web tool at vust.ai/markdown/telegram-to-markdown accepts the message text and entities as JSON input. Paste the export from Telegram's API or your bot, run the conversion, get clean Markdown. The web flow is best when you have a structured source (an export, a saved API response).
The Telegram bot at @vustMarkdownBot accepts forwarded messages directly. When you forward a formatted message to the bot, Telegram delivers the message text plus entities to the bot's API. The bot runs the conversion and replies with the Markdown output. This is the cleanest path for ad-hoc, single-message conversions — no JSON to extract, no manual entity assembly.
For batch conversions over a chat history, exporting to JSON via the Telegram desktop client and running through the web tool or a custom bot integration is the right pattern. Single-message conversions are best done in the bot.
The result is a converter purpose-built for Telegram's unique formatting model, with security guardrails on URLs, and explicit handling of every entity type — keeping what Markdown can express and dropping what it cannot, transparently.