views:

1499

answers:

4

it seems that a loose definition of PCDATA and CDATA is that

1) PCDATA is character data, but is to be parsed.
2) CDATA is character data, and is not to be parsed.

but then someone told me that CDATA is actually parsed or PCDATA is actually not parsed... so it is a bit of a confusion. Does anyone know the real deal is?

Update: I actually added the PCDATA definition on Wikipedia... so don't take that answer too seriously as that's only my rough understanding of it.

+3  A: 

PCDATA - Parsed Character Data

CDATA - (Unparsed) Character Data

http://www.w3schools.com/XML/xml_cdata.asp

AndrewS
A: 

Your first definition is correct.

PCDATA is parsed which means that entities are expanded and that text is treated as markup. CDATA is not parsed by an XML parser.

Ronald Wildenberg
+11  A: 

From WIKI:

PCDATA

Simply speaking, PCDATA stands for Parsed Character Data. That means the characters are to be parsed by the XML, XHTML, or HTML parser. (&lt; will be changed to <, <p> will be taken to mean a paragraph tag, etc). Compare that with CDATA, where the characters are not to be parsed by the XML, XHTML, or HTML parser.

CDATA

The term CDATA, meaning character data, is used for distinct, but related purposes in the markup languages SGML and XML. The term indicates that a certain portion of the document is general character data, rather than non-character data or character data with a more specific, limited structure.

Ólafur Waage
+3  A: 

Both PCDATA and CDATA are parsed. They are both character data.

They both must only include valid characters. For example if your document encoding is UTF-8, the content of CDATA sections must still be valif UTF-8 characters. So random binary data will probably prevent the document from being well-formed. Also CDATA sections are still parsed, if only to find the end section tag. But other markup-like characters, like <, > and & are ignored and passed as-is by the parser.

OTOH in PCDATA litteral < and & (and ' or " in attribute values) must be escaped, or they will be interpreted as markup. Entities will also be expanded.

So yes, CDATA sections are indeed parsed. I am not sure why you were told that PCDATA is not parsed though.

mirod