views:
1033answers:
3A CDATA section is for the purpose of allowing literal text that would normally be interpreted in a special way in an XML document. That is, something that looks like an entity reference, or something that looks like XML tags. Anything in a CDATA section can be inside valid XML without a CDATA section; you'll just need to use entity references to encode the various special characters so they won't be treated as XML markup, but as character data that is the value of a tag.
So yes, the following is perfectly valid, as long as it is what you intend:
<?xml version="1.0" encoding="UTF-8" ?>
<outer>
<inner><![CDATA[©]]></inner>
</outer>
Here, the value of the inner
element is the value ©
which will not be interpreted by the XML parser as the entity reference for the copyright symbol. You can also do the following:
<?xml version="1.0" encoding="UTF-8" ?>
<outer>
<inner><![CDATA[<normally> this looks <like/> & xml </normally>]]></inner>
</outer>
where the value for the inner
element is
<normally> this looks <like/> & xml </normally>
To do this without a CDATA section:
<?xml version="1.0" encoding="UTF-8" ?>
<outer>
<inner><normally> this looks <like/> &amp; xml </normally></inner>
</outer>
which is much less human-readable, but equivalent as far as an XML parser is concerned. If you did this (assuming that the inner
element is defined an a schema or DTD as containing a string and not XML) then your XML parser will complain:
<?xml version="1.0" encoding="UTF-8" ?>
<outer>
<inner><normally> this looks <like/> & xml </normally></inner>
</outer>
so you use the CDATA or entity escaping to protect the special characters from the XML parser so the client of the XML data can get the value of inner
which happens to contain XML markup characters.
Note: To be clear, the above example is well formed XML, but if the schema or DTD says that the element inner
contains xsd:string or equivalent, then it is an invalid XML document.
And no, HTML or XHTML entities that are not defined as part of XML itself are not valid XML unless they are defined. Your XML parser will return an error.
Eddie gave a good reply, I just complete on some points that he apparently did not mention.
<?xml version="1.0" encoding="UTF-8" ?>
<outer>
<inner>©></inner>
</outer>
is not legal (entity "copy" is not predefined, only "lt", "gt" and "quot" are, in XML).
<?xml version="1.0" encoding="UTF-8" ?>
<outer>
<inner>©</inner>
</outer>
is perfectly legal and probably gives what you want (a copyright symbol).
<?xml version="1.0" encoding="UTF-8" ?>
<outer>
<inner><![CDATA[©]]></inner>
</outer>
is also perfectly legal but yields a quite different result (the
element <inner>
will contain six Unicode characters, instead of one in
the previous example).
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE outer[
<!ENTITY copy "©">
]>
<outer>
<inner>©></inner>
</outer>
is legal, too, and gives the same result as the second example. It can save you from typing some characters that you use but are not easy to generate with your keyboard/editor.
<?xml version="1.0" encoding="UTF-8" ?>
<outer>
<inner>©</inner>
</outer>
is legal, too (because encoding="UTF-8", with encoding="US-ASCII", it would have been impossible), and gives the same result. Providing that your keyboard/editor allows you to use directly this character.