views:

526

answers:

4

I want to inline Scripts or CSSs into xHTML without escaping special characters.

I can do that using a CDATA marked section.

According to www.w3.org/TR/xhtml1/#h-4.8 the CDATA section can be defined as:

   <script type="text/javascript">
      <![CDATA[
         ... unescaped script content ...
      ]]>
   </script>

Then, according to http://www.w3schools.com/TAGS/tag_script.asp the CDATA can look like:

   <script type="text/javascript"><![CDATA[
     // some code
   //]]></script>

Which method for closing the CDATA section is better? ]]> or //]]> ?

+1  A: 

I would just do it without the //. Those are a throwback to the days when certain browsers (who shall remain nameless) had to be "fooled" into accepting closing brackets in script tags.

Robusto
No, it's nothing like the old issue with `<!--` ... `-->` (which is indeed obsolete). An unadorned `<![CDATA` inside a script element in HTML4 is not elided and a browser *will* trip over it. Try it — you'll get a JavaScript error.
bobince
(who shall remain nameless)... MS-IE anyone??? </hate>
trinithis
A: 

You can put comments before the CDATA tags if you are worried that someone is using a very old browser that doesn't know about XHTML at all. But then you have to put a comment before the starting tag also to prevent it from causing a syntax error:

<script type="text/javascript">
//<![CDATA[
  // some code
//]]>
</script>
Guffa
+1  A: 

Depends on the browser. Despite what some people think, w3schools is not related to the W3C, so their advice is to be taken with a grain of salt.

Modern browsers should be able to recognise CDATA sections. MSIE OTOH doesn't, but that's okay, because it doesn't support XHTML at all (you're not sending XHTML content as text/html for MSIE compatibility, are you? then there'd be not much point to be using XHTML in the first place).

The problem is that browsers that don't fully understand XHTML will treat CDATA directives as regular text.

tl;dr: the full backwards-compatible solution would be something like:

<script type="text/javascript"><!--//<![CDATA[
code goes here...
//]]>--></script>

That is just repulsive. Either stick your JS in JS files if you want to keep the backwards compatibility or stick to HTML until you can afford ignoring MSIE 8 (which, going by how many years it took people to shun MSIE 6, might be around the year 2020).

The HTML comment (<!-- -->) is only required for browsers that don't understand the script tags. The double slashes are required for browsers that don't understand CDATA sections (i.e. non-XHTML browsers like MSIE). The CDATA section is required for XHTML to avoid malformed XML (greater-than and less-than comparisons, for example, would break XML otherwise or need escaping, which is again a browser problem).

For more information on the problem with sending XHTML as text/html, read: http://hixie.ch/advocacy/xhtml

EDIT: To correct myself, the full syntax for backwards support would actually be this according to Hixie:

  <script type="text/javascript"><!--//--><![CDATA[//><!--
    ...
  //--><!]]></script>

Thanks, Alohci.

Alan
Given that you mention Hixie's article, I'd have thought you'd have quoted the incantation he provides therein, which is somewhat more complicated than your answer.
Alohci
That incantation is for supporting HTML4 plus XHTML1 plus ancient pre-HTML3.2 browsers that don't understand the `<script>` or `<style>` elements. There are no extant pre-HTML3.2 browsers, and there weren't really at the time I came up with it, either (http://www.doxdesk.com/personal/posts/wd/20010911-cdata.html). It should not be used.
bobince
True, but the point remains. If you're worrying about escaping your cdata section because you're sending XHTML to as HTML to browsers that parse it as tagsoup, _you're doing it wrong_. If you're going to use XHTML, don't send it as text/html. If you're using HTML, don't put CDATA sections in it. Apart from that my advice still stands: keep your scripts out of your HTML if you want to make sure no browsers trip over it.
Alan
+12  A: 

According to www.w3.org/TR/xhtml1/#h-4.8 the CDATA section can be defined as: [no //]

Yeah. In XHTML, they can. Proper XHTML, as read by an XML parser like when you serve application/xhtml+xml to a web browser that isn't IE.

But probably you're actually serving as text/html, which means your browser isn't an ‘XML processor’ as referenced in that section. It's a legacy-HTML4 parser, so you have to abide by the appendix C guidelines and avoid any XML features that don't work in HTML4.

In particular, the strings <![CDATA[ and ]]> in a <script> or <style> block are not special to an HTML4 parser, because in HTML4 those two elements are ‘CDATA elements’ where markup doesn't apply (except for the </ ETAGO sequence to end the element itself). So an HTML4 parser will send those strings straight to the CSS or JavaScript engine.

Because <![CDATA[ is not valid JS, you'll get a JavaScript syntax error. (The other answers are wrong here: it's not just very old browsers, but all HTML4 browsers, that will give errors for an uncommented CDATA section in script.)

You use the // or /* comment markup to hide the content from the JavaScript or CSS engine. So:

<script type="text/javascript">//<![CDATA[
    alert('a&b');
//]]></script>

(Note the leading //; this was omitted in the W3Schools example code, and makes that example code not work at all. Fail. Don't trust W3Schools: they are nothing to do with W3C and their material is often rubbish.)

This is read by an HTML parser as:

  • Open-tag script establishing CDATA content until the next ETAGO
  • Text //<![CDATA[\n alert('a&b');\n//]]>
  • ETAGO and close-tag script
  • -> resultant content sent to JavaScript engine: //<![CDATA[\nalert('a&b');\n//]]>

But by an XML parser as:

  • Open-tag script (no special parsing implications)
  • Text content //
  • Open CDATA section establishing CDATA content until the next ]]> sequence
  • Text \n alert('a&b');\n//
  • Close CDATA section
  • Close-tag script
  • -> resultant content sent to JavaScript engine: //\nalert('a&b');\n//

Whilst the parsing process is quite different, the JS engine ends up with the same effective code in each case, as thanks to the //​s the only difference is in the comments.

Note this is a very different case to the old-school:

<script type="text/javascript"><!--
    alert('a&b');
//--></script>

which was to hide script/style content so that it didn't get written onto the page in browsers that didn't understand <script> and <style> tags. This will not generate a JavaScript/CSS error, because a hack was put it at a different level: it is a syntactical feature of the CSS and JavaScript languages themselves that <!-- is defined to do nothing, allowing this hack to work.

Those browsers are ancient history; you absolutely should not use this technique today. Especially in XHTML, as an XML parser would take you at your word, turning the whole script block into an XML comment instead of executable code.

I want to inline Scripts or CSSs into xHTML without escaping special characters.

Avoid doing this and you will be much happier.

Do you really need the < and & characters in a <style>? No, almost never. Do you really need them in <script>? Well... sometimes, yeah, and in that case the commented-CDATA-section is acceptable.

But to be honest, XHTML compatibility guideline C.4 is as applicable to HTML4 as it is to XHTML1: anything non-trivial should be an in external script, and then you don't have to worry about any of this.

bobince
"Avoid doing this and you will be much happier." Seconded.
D_N
Really nice answer bobince! Thanks!
AlexV
Thanks for this detailed answer. It clearly respond to my question.Marc.
Marc