Stripping HTML from a string using Javascript seems like it should be a fairly trivial task. The problem is, it is easy to open users up to XSS attacks using a naive solution.

Last week I had a look around the interwebs to see what the gods could teach me and came across a solution on Stack Overflow that uses the browser's DOM.

A naive solution is:

function strip(html)
   var tmp = document.createElement("DIV");
   tmp.innerHTML = html;
   return tmp.textContent||tmp.innerText;

I read through the answer and thought "Awesome! Let the browser do the heavy lifting!" This solution seems both reasonable and safe from XSS - scripts are not supposed to be run until they are attached to the DOM (or so I thought). I assured myself after testing the strip function with an equally naive test string:


No alert was displayed.

A comment by Mike Samuel on the solution explained that "strip" was dangerous and should not be used with strings that come from untrusted sources.

Mike's counter gives an example that shows how this version of strip is vulnerable to XSS:


You can see how Mike's example works in JSFiddle.

Bad jiji. Mike's example shows that scripts don't have to be inside a script tag, and some scripts are run even though the element is not attached to the DOM. My belief regarding when scripts are run was blown out of the water immediately.

Where else was my assumption incorrect? I began to dig deeper to try to find other failure modes.

I tried to find whether asynchronously loaded scripts, scripts with the async attribute, would fail as well. Nope. Next, I tried the deprecated defer attribute. While most browsers have removed support for the defer attribute, IE versions 6 through 9 are still widely used and support it.


I only tested IEs 8 and 9 - IE8 did not run the script, IE9 did. My assumption now has at least two verifiable exceptions.

Later in the StackOverflow answers list, Mike presents a simple RegExp to remove the scripts:

html.replace(/<[^>]*>?/g, "");

This is much simpler than the original solution with the added bonus that it works without running scripts.

A very important caveat - this solution is only safe for a DOM element's inner content, different techniques should be used to assign the value of a DOM element's attribute.

To learn more about XSS and common mitigation techniques, read Open Web Application Security Project (OWASP) XSS Prevention Cheat Sheet. The cheat sheet is an excellent resource that gives 9 rules to keep users safe from XSS. Read it, learn from it. Keep vigilant for your user's sake.