views:

126

answers:

4

I was playing around with a Python-based HTML parser and parsed Stackoverflow. The parser puked on a line with

HTMLParser.HTMLParseError: bad end tag: "</'+'scr'+'ipt>", at line 649, column 29

The error points to the following lines of javascript in the site's source:

<script type="text/javascript">
    document.write('<s'+'cript lang' + 'uage="jav' + 'ascript" src=" [...] ">'); 
    document.write('</'+'scr'+'ipt>');
</script>

([...] replace a long link, which is removed for simplicity)

Out of curiosity, is there a specific reason for what looks to me like artificial 'obfuscation' of the code, i.e. why use the document.write method to concatenate all the chopped up strings?

A: 

Perhaps its there to stop programs that search specifically for script tags. Ad blockers, for example, look for script tags and object tags.

Gabriel McAdams
Funny how the most upvoted answer and the most downvoted answer are the same in this question. :)
carl
+5  A: 

I think it's to fight adblockers.

... + 'uage="jav' + 'ascript" src="http://ads.stackoverflow.com
Derek Illchuk
Yeah, this looks like it because it's in the ad section of the page. And they even obfuscated the div class name to not sound ad-like. :)
Jeff
+2  A: 

It has been written in that way to avoid the browser thinks it's the closing tag for <script>, which would cause some problems.

kiamlaluno
This doesn't explain why the top line is chopped up, though.
Anon.
Because you cannot write a tag `<script>` inside another one.
kiamlaluno
The browser doesn't care what's inside a `<script>` tag - it just ignores everything from that point on until it sees the magic characters '<', '/', 's', 'c', 'r', 'i', 'p', 't', '>'.
Anon.
The '<script>' tag is probably zapped because it upsets the syntax highlighting in whatever editor/IDE they are using.
James Anderson
That is not exact; the browser takes what is inside the tags <script></script>, and passes it to the JavaScript parser, which would not understand the text <script> it would find. As it is not written between quotes, it would not be even interpreted as a string. Browsers are not supposed to handle indented script tags, and they don't handle them.
kiamlaluno
+1  A: 

When the HTML parser encounters document.write('</script>');, it thinks it has found the end of the enclosing <script> tag. Breaking the tag up stops the parser from recognising the closing tag.

The other way I've seen this achieved is by escaping the slash, i.e. document.write('<\/script>');.

The correct way to do this is either:

  • Enclose the body of the script in a <![CDATA[ ... ]]> block (if serving XHTML), or
  • Put the script in an external file, or
  • Use the DOM API instead (i.e. create a script node and append that to the document head)
harto
CDATA blocks won't help you here. Not if document is served and rendered as HTML.
kangax
@kangax - Duly noted, thanks
harto