views:

104

answers:

2

There are many SO questions (e.g. here and here) about how to do server-side scrubbing of Markdown produced by the WMD editor to ensure the HTML generated doesn't contain malicious script, like this:

<img onload="alert('haha');" 
   src="http://www.google.com/intl/en_ALL/images/srpr/logo1w.png" />

But I didn't find a good way to plug the hole on the client side too. Client validation isn't a replacement for scrubbing validation on the server of course, since anyone can pretend to be a client and POST you nasty markdown. And if you're scrubbing the HTML on the server, an attacker can't save the bad HTML so no one else will be able to see it later and have their cookies stolen or sessions hijacked by the bad script. So there's a valid case to be made that it may not be worth enforcing no-script rules in the WMD preview pane too.

But imagine an attacker found a way to get malicious markdown onto the server (e.g. a compromised feed from another site, or content added before an XSS bug was fixed). Your server-side whitelist applied when translating markdown to HTML would normally prevent that bad markdown from being shown to users. But if the attacker could get someone to edit the page (e.g. by posting another entry saying the malicious entry had a broken link and asking someone to fix it), then anyone who edits the page gets their cookies hijacked. This is admittedly a corner case, but it still may be worth defending against.

Also, it's probably a bad idea to allow the client preview window to allow different HTML than your server will allow.

StackOverflow has plugged this hole by making changes to WMD. How did they do it?

[NOTE: I already figured this out but it required some tricky javascript debugging, so I'm answering my own question here to help others who may want to do ths same thing]

+1  A: 

One possible fix is in wmd.js, in the pushPreviewHtml() method. Here's the original code from the StackOverflow version of WMD on GitHub:

if (wmd.panels.preview) {
    wmd.panels.preview.innerHTML = text; 
}

You can replace it with some scrubbing code. Here's an adaptation of the code that stackoverflow uses in response to this post, which restricts to a whitelist of tags, and for IMG and A elements, restricts to a whitelist of attributes (and in a specific order too!). See this meta.stackoverflow post for more info on the whitelist.

Note: this code can certainly be improved, e.g. to allow whitelisted attributes in any order. It also disallows mailto: URLs which is probably a good thing on Internet sites but on your own Intranet site it may not be the best approach.

if (wmd.panels.preview) {

    // original WMD code allowed javascript injection, like this:
    //    <img src="http://www.google.com/intl/en_ALL/images/srpr/logo1w.png" onload="alert('haha');"/>
    // now, we first ensure elements (and attributes of IMG and A elements) are in a whitelist
    // and if not in whitelist, replace with blanks in preview to prevent XSS attacks 
    // when editing malicious markdown
    var okTags = /^(<\/?(b|blockquote|code|del|dd|dl|dt|em|h1|h2|h3|i|kbd|li|ol|p|pre|s|sup|sub|strong|strike|ul)>|<(br|hr)\s?\/?>)$/i;
    var okLinks = /^(<a\shref="(\#\d+|(https?|ftp):\/\/[-A-Za-z0-9+&@#\/%?=~_|!:,.;\(\)]+)"(\stitle="[^"<>]+")?\s?>|<\/a>)$/i;
    var okImg = /^(<img\ssrc="https?:(\/\/[-A-Za-z0-9+&@#\/%?=~_|!:,.;\(\)]+)"(\swidth="\d{1,3}")?(\sheight="\d{1,3}")?(\salt="[^"<>]*")?(\stitle="[^"<>]*")?\s?\/?>)$/i;
    text = text.replace(/<[^<>]*>?/gi, function (tag) {
        return (tag.match(okTags) || tag.match(okLinks) || tag.match(okImg)) ? tag : ""
    })

    wmd.panels.preview.innerHTML = text;  // original code 
}

Also note that this fix is not in the StackOverflow version of WMD on GitHub-- clearly the change was made later and not checked back into GitHub.

UPDATE: in order to avoid breaking the feature where hyperlinks are auto-created when you type in a URL, you also will need to make changes to showdown.js, like below:

Original code:

var _DoAutoLinks = function(text) {

    text = text.replace(/<((https?|ftp|dict):[^'">\s]+)>/gi,"<a href=\"$1\">$1</a>");

    // Email addresses: <[email protected]>

    /*
        text = text.replace(/
            <
            (?:mailto:)?
            (
                [-.\w]+
                \@
                [-a-z0-9]+(\.[-a-z0-9]+)*\.[a-z]+
            )
            >
        /gi, _DoAutoLinks_callback());
    */
    text = text.replace(/<(?:mailto:)?([-.\w]+\@[-a-z0-9]+(\.[-a-z0-9]+)*\.[a-z]+)>/gi,
        function(wholeMatch,m1) {
            return _EncodeEmailAddress( _UnescapeSpecialChars(m1) );
        }
    );

    return text;
}

Fixed code:

var _DoAutoLinks = function(text) {
    // use simplified format for links, to enable whitelisting link attributes
    text = text.replace(/(^|\s)(https?|ftp)(:\/\/[-A-Z0-9+&@#\/%?=~_|\[\]\(\)!:,\.;]*[-A-Z0-9+&@#\/%=~_|\[\]])($|\W)/gi, "$1<$2$3>$4");
    text = text.replace(/<((https?|ftp):[^'">\s]+)>/gi, '<a href="$1">$1</a>');
    return text;
}
Justin Grant
I'm not convinced that this is something that needs fixing. It looks like a solution in search of a problem. Maybe the reason you don't see this code in the StackOverflow version of WMD is because it doesn't exist, because it is not needed.
Robert Harvey
Yep, I'm not convinced it's needed either. That said, the StackOverflow.com guys implemented this in order to ensure that the previewer never generated HTML that their server-side validator wouldn't accept. Seems reasonable, although I agree not a terribly high priority. See http://meta.stackoverflow.com/questions/1227/preview-should-match-the-posted-view for more details about why SO did it. BTW I just edited my question to align with the actual reason that SO wanted to do this.
Justin Grant
You could AJAX post the HTML and get back the sanitised HTML from the server, to get a perfect preview.
ceejayoz
+2  A: 

It is not a security issue to allow the local user to execute scripts in the page context as long as it's impossible for any third party to provide the script. Without the editor doing it, the user could always enter a javascript: url while on your page or use Firebug or something similar.

ThiefMaster
@ThiefMaster - initially I agreed with you, but I did find an interesting case: if there were another way for an attacker to get compromised markdown onto the server, then this WMD preview issue would be dangerous. If an attacker could get a site moderator to edit a bad page (e.g. to clear out a broken link), then he could potentially take control of the whole site. True, t'd be easy to defeat by scrubbing markdown on the server before sending down to the edit page, but output validation is often ignored. This is admittedly a corner case, but may be worth defending against for a secure site.
Justin Grant