tags:

views:

1821

answers:

2

What is the difference between #PCDATA and #CDATA in DTD?

+1  A: 

From here (Google is your friend):

In a DTD, PCDATA and CDATA are used to assert something about the allowable content of elements and attributes, respectively. In an element's content model, #PCDATA says that the element contains (may contain) "any old text." (With exceptions as noted below.) In an attribute's declaration, CDATA is one sort of constraint you can put on the attribute's allowable values (other sorts, all mutually exclusive, include ID, IDREF, and NMTOKEN). An attribute whose allowable values are CDATA can (like PCDATA in an element) contain "any old text."

A potentially really confusing issue is that there's another "CDATA," also referred to as marked sections. A marked section is a portion of element (#PCDATA) content delimited with special strings: to close it. If you remember that PCDATA is "parsed character data," a CDATA section is literally the same thing, without the "parsed." Parsers transmit the content of a marked section to downstream applications without hiccupping every time they encounter special characters like < and &. This is useful when you're coding a document that contains lots of those special characters (like scripts and code fragments); it's easier on data entry, and easier on reading, than the corresponding entity reference.

So you can infer that the exception to the "any old text" rule is that PCDATA cannot include any of these unescaped special characters, UNLESS they fall within the scope of a CDATA marked section.

Oli
+2  A: 

A Good reference.

PCDATA - Parsed Character Data

XML parsers normally parse all the text in an XML document.

CDATA - (Unparsed) Character Data

The term CDATA is used about text data that should not be parsed by the XML parser.

Characters like "<" and "&" are illegal in XML elements.

Matthew Vines