DevLab
Encodingbeginner

HTML Entity Encoder

encode and decode HTML entities for safe HTML embedding

By Bikram NathLast updated

Paste text containing HTML special characters and get back a safely escaped string ready for embedding in markup. Most useful when injecting user-supplied content into HTML templates: input `<script>alert(1)</script>` returns `&lt;script&gt;alert(1)&lt;/script&gt;`, text the browser renders literally instead of executing. The decoder direction reverses pre-escaped strings from API responses or CMS exports where ampersands and angle brackets have already been entity-encoded.

Try it now — free, instant, no signup

What is HTML Entity Encoder?

This tool takes a raw string and replaces characters that carry structural meaning in HTML, specifically `<`, `>`, `&`, `"`, and `'`, with their named entity equivalents (`&lt;`, `&gt;`, `&amp;`, `&quot;`, `&apos;`) or numeric references. Running `AT&T <wireless>` through it returns `AT&amp;T &lt;wireless&gt;`, which is safe to place inside a paragraph, an attribute value, or any text node without disrupting surrounding markup.

Developers reach for an entity encoder when they need a one-off conversion without writing code. The same result is achievable in Python with `html.escape()`, in Node.js with the `he` library, or in a browser console with a two-liner using `document.createElement('div').innerHTML`. This tool is faster for spot-checks, paste-and-verify workflows, or when you are on a machine without those runtimes installed.

One precise technical boundary: the minimum safe set for HTML text nodes is `&`, `<`, and `>`. For attribute values you additionally need to encode the wrapping quote character. A string placed inside `onclick` is decoded by the HTML parser first, then evaluated as JavaScript, so entity-encoding a payload destined for a script context does not prevent execution. Entity encoding is the correct defence only for text nodes and attribute values, not for script or style contexts.

When to use HTML Entity Encoder

Encode a CMS-sourced product title before injecting it into an email HTML template to prevent broken layout or tag injection.
Decode an XML API response where ampersands arrive as `&amp;` so downstream string comparisons work against the original value.
Verify that a server-side sanitisation function is encoding a specific edge-case character correctly before shipping user-input rendering to production.

Frequently Asked Questions

What is the difference between named entities like `&amp;` and numeric references like `&#38;`?
Both represent the same character, but named entities are only valid in HTML and depend on the parser recognising the name. Numeric decimal (`&#38;`) and hex (`&#x26;`) references work in HTML, XHTML, and XML parsers alike. If you are generating content that may end up inside an XML feed, SVG file, or Atom document, prefer numeric references. Those parsers do not recognise most HTML5-only named entities such as `&ngtriangle;` or `&varsubsetneqq;`, and an unrecognised named entity is left as literal text rather than decoded.
Does HTML-encoding a string prevent XSS in all contexts?
Only in HTML text nodes and standard attribute values. If you insert an encoded string inside a `<script>` block, a `javascript:` href, or an inline event handler like `onclick`, the browser decodes the entity first and then evaluates the result as JavaScript. Encoding `&lt;script&gt;` inside an `onclick` attribute does not prevent execution once the HTML parser processes the page. For those contexts, entity encoding is the wrong defence layer. Use a Content Security Policy that blocks inline scripts and avoid constructing handlers from user-supplied strings entirely.
Why does the decoder sometimes produce an invisible extra space instead of a normal space?
That character is almost certainly a non-breaking space, U+00A0, encoded as `&nbsp;`. Unlike a regular space (U+0020), it does not collapse under CSS white-space rules and prevents line-breaks between adjacent words. CMS systems and rich-text editors insert `&nbsp;` frequently, and pasting HTML from Word or Google Docs is the most common source. To confirm, inspect the bytes in a hex viewer: a non-breaking space encodes as two bytes `0xC2 0xA0` in UTF-8, while a regular space is the single byte `0x20`.
Should I encode differently for HTML attribute values versus text node content?
The safe minimum for text nodes is `&`, `<`, and `>`. For attribute values you additionally need to encode the quote character wrapping the attribute: `&quot;` for double-quoted attributes, `&apos;` for single-quoted ones. Skipping quote encoding in attributes enables injection via input like `" onmouseover="evil()`. This tool encodes the full standard set including both quote types, which is the correct default when you do not control which quote style the surrounding template uses. Note that `&apos;` is valid in HTML5 and XHTML but was not part of the HTML4 named entity set.
Do multibyte characters like accented letters or emoji need to be encoded?
Not in HTML5 documents served as UTF-8. The browser reads them directly from the byte stream, so encoding `é` as `&#233;` or emoji as their numeric references is harmless but unnecessary verbosity. The only characters that genuinely require encoding are the ones with structural meaning in HTML: `&`, `<`, `>`, and context-dependent quote characters. Encoding everything outside ASCII makes sense only if you are generating HTML for a legacy environment that enforces ASCII-only output, such as certain XML serialisers or older HTML-generating email libraries.

Related Tools