DevLab
</>
HTMLbeginner

HTML to Markdown

convert HTML to clean Markdown format

By Bikram NathLast updated

Paste raw HTML and get back clean Markdown in one step. A `<h2>` tag containing a nested `<code>` element comes out as `## Heading` with inline backtick code preserved, making it immediately usable in a Hugo post or a GitHub README. Tables convert to GFM pipe syntax, which is the key difference from plain text extraction tools that lose table structure entirely.

Try it now — free, instant, no signup

What is HTML to Markdown?

The converter walks an HTML string element by element, mapping each tag to its Markdown equivalent. Headings become ATX-style hash prefixes, anchor tags become `[text](url)`, and `<code>` inside a `<pre>` becomes a fenced code block with triple backticks. Pasting a raw blog post scraped from a CMS returns structured Markdown paragraphs with the heading hierarchy intact and all `style` attributes dropped.

For one-off conversions, this beats installing Pandoc or setting up Python just to run html2text. Pandoc is the right choice when batch-converting files, needing footnote or definition-list support, or wanting output-flavor control via flags like `--to=gfm` or `--wrap`. html2text handles bulk pipeline work well. This tool covers the common case: an HTML snippet that needs to become readable Markdown without any local tooling.

The main gotcha is tables. HTML `<table>` elements convert to GFM pipe tables, which render correctly on GitHub, GitLab, and most static site generators, but strict CommonMark has no table syntax at all. If your target renderer enforces CommonMark-only, those pipe characters appear as literal text. Separately, `<div>` wrappers with no semantic role generate extra blank lines, producing stray paragraph breaks once the Markdown is rendered.

When to use HTML to Markdown

Copy a live webpage's article HTML via DevTools and convert it to Markdown for a Hugo or Jekyll content directory.
Extract readable post bodies from a CMS database export where content columns store raw HTML strings.
Migrate an HTML email newsletter to plain Markdown prose before importing it into a Notion page or GitHub issue.

Frequently Asked Questions

Why do `<div>` wrappers produce extra blank lines in the Markdown output?
`<div>` is a block-level element with no Markdown equivalent, so the converter treats it as a block boundary and inserts a blank line where the tag was. If your HTML wraps every paragraph in a `<div class="content-block">`, the output gets double-spaced. The practical fix is to grab the innerHTML of just the article body node in DevTools rather than copying the full page source, which eliminates most layout-only wrapper divs before the conversion runs.
Do the converted tables work in all Markdown renderers, or only some?
HTML `<table>` elements convert to GFM pipe tables, the format GitHub popularized with rows like `| Col1 | Col2 |` and a separator row of `| --- | --- |`. GFM tables render correctly on GitHub, GitLab, Hugo, Jekyll, Docusaurus, and most Markdown editors. However, the CommonMark spec (commonmark.org) still omits table syntax as of its 2024 release. If your renderer enforces strict CommonMark with no extensions loaded, the output will appear as raw pipe-delimited text, not a formatted table.
What happens to `<span>` tags and inline `style` attributes during conversion?
`<span>` tags with no semantic role are unwrapped: the text content is kept but the tag disappears entirely. Inline `style` attributes are stripped, so colored text, custom font sizes, and `text-align` rules do not survive. `<strong>` and `<em>` map to `**bold**` and `_italic_` correctly, but `<span style="font-weight:bold">` loses its formatting because the converter reads element semantics, not CSS property values.
How does this compare to running `pandoc -f html -t gfm` locally?
Pandoc produces more complete output in most cases. It handles footnotes, definition lists, and reference-style link definitions, and the `--wrap` flag controls line length precisely. The output is also more consistent across complex nested structures. This browser tool requires zero setup and handles the standard article-to-Markdown case well. Reach for Pandoc when processing files in a script, needing precise flavor control between `markdown_strict`, `gfm`, and `commonmark`, or when the source HTML contains footnotes or other structures a browser converter silently drops.
Will deeply nested lists like `<ul>` inside `<li>` convert with correct indentation?
Two or three levels of nesting generally convert correctly, with child items indented two or four spaces. Very deep nesting, five or more levels, can produce inconsistent output because Markdown parsers disagree on indentation rules: CommonMark requires four spaces to signal a new nesting level, while many parsers accept two. If the converted output renders flat in your target editor, check how many spaces the converter uses per indent level and adjust to match what your specific renderer expects.

Related Tools