ansaurus

Question

Escaping JavaScript[/CSS] between <script>[/<style>] tags: Insights on a potentially broken status quo

Answer 1

+5 A:

Because it is very common to want to use characters such as & and < in scripts, and escaping them is a pain.

On the flip side, <script> and <style> can't have child elements, so there is no need to make it easy to include a tag.

The result - HTML defines <script> and <style> as containing CDATA in the DTD, so you don't need to do it manually in the document, thus making life easier.

XHTML is different. In many ways XML is simpler then SGML, and its DTDs don't (as far as I know) have that facility. Hence, you need to be explicit about CDATA markers (or use entities) in XHTML. The only reason it is a "clusterfuck" is because people claim their XHTML is HTML by serving it with a text/html content-type (instead of the correct application/xhtml+xml).

As for intrinsic event attributes, SGML doesn't make it possible to say that special characters should not be treated as such, but when they are used they shouldn't contain much more than a function call … and are better avoided in favour of unobtrusive JS anyway.

David Dorward 2010-09-15 19:01:34

You're right about CDATA being an XHTML(/XML) thing. I'll edit that into my question, since I do know it, but I've apparently managed to omit it. D'oh. :) Will respond to the other points in a second (but definitely thanks for responding, and you're probably on to something).

pinkgothic 2010-09-15 19:07:03

Right, back to you. :D I didn't realise HTML defines the contents of those tags as `CDATA`; I just thought that JS-blocks passing validation in HTML came from SGML being more lenient than XML! Learn something new every day. (Where is that +2 when you need it?) The other stuff I've sort of rambled to death in MooGoo's general direction, so I'm not going to repeat them here (if that's okay).

pinkgothic 2010-09-15 19:19:52

I think the main reason is that `<script>` and `<style>` elements don't need children, so removing the need to escape everything manually (or put it in `CDATA`) seemed logical because you would never *actually* want to write an unescaped `<` inside a script.

musicfreak 2010-09-15 19:26:01

@musicfreak: Ooh, great comment for emphasis. Now I'm seeing David's second paragraph in a wholly different light.

pinkgothic 2010-09-15 19:30:11

Answer 2

+2 A:

Because in Javascript you are constantly using characters that would need to be escaped in HTML. That is the point of having CDATA after all isn't it?

Tell me what you think looks more reasonable

if (5 &gt; 4 &amp;&amp; 2 &lt; 3) alert('dude');

Or

if (5 > 4 && 2 < 3) alert('dude');

Also in the vast majority of cases, both CSS and Javascript should be included as links to separate files, rather than inlined in HTML, thus avoiding the escaping issue entirely.

MooGoo 2010-09-15 19:05:57

"Also in the vast majority of cases, both CSS and Javascript should be included as links to separate files, rather than inlined in HTML, thus avoiding the escaping issue entirely." I completely agree. :) But I don't feel 'constantly using characters that should be escaped' is ever a valid reason not to escape something - yet, the more I think about it, the more that's probably what happened when it was defined. :/ Do you have an idea why CSS does it? It's far less frequent there.

pinkgothic 2010-09-15 19:10:42

By the way, simply because it's probably not what you expect me to say, but nonetheless true, I consider former more reasonable - *in an HTML context*. To me, escaped data will always look more reasonable, that's the aforementioned OCD coming through (even if it is slightly tongue-in-cheek). ;) Nonetheless, I do understand what you're getting at. And agree that it's probably the motivation.

pinkgothic 2010-09-15 19:15:25

I would say that `<![CDATA[ ... ]]>` simply changes what characters need to be escaped. As you said, any instance of `]]>` would break validation, so if it were to be included in the JS code, it would have to be represented in a way that does not conflict with the "host" language, thus...escaping it. Probability of `]]>` appearing in most Javascript code is pretty low, so it is a reasonable trade-off.

MooGoo 2010-09-15 19:28:06

MooGoo 2010-09-15 19:34:45

'so anything to make it easier and less error-prone is a good thing I think' - that's kind of why the 'oh we won't escape it here!' bugs me, though, because it starts adding exceptions to what is otherwise a pretty clear rule. (I don't like HEREDOC either, to be honest.) And I can live with how it is now (especially after your and David's helpful answers), but it's hard not to consider it hack... -ish.

pinkgothic 2010-09-15 19:41:06

Pre-processing the HTML server-side is actually something I wanted to do, that got foiled by this behaviour. I thought it might be an option to HTML-escape ), (, } and { (and only those) across the entire output-document to prevent cross-origin CSS attacks, but that failed because of JS blocks. Well. Things in life are never that easy. ;) And that would have broken if we'd had non-implicit CDATA blocks somewhere, so it's probably good that *this* threw a wrench in the works. It just surprised me.

pinkgothic 2010-09-15 19:44:17

Yea...clear rules often seem to give way to convenience; such is life, especially in the relm of computers. I hear what you are saying about reasonable looking HTML. To me, almost any CSS/JS in HTML is ugliness. Perfectly indented HTML on the other hand is a thing of beauty.

MooGoo 2010-09-15 19:45:07

ansaurus

tags:

views:

answers:

Escaping JavaScript[/CSS] between <script>[/<style>] tags: Insights on a potentially broken status quo

related questions