views:

36

answers:

2

I am building my own humble (x)html parser. All is ok, but some doctype tags break it. Let me show you:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" [
<!ENTITY D "&#x2014;">
<!ENTITY o "&#x2018;">
<!ENTITY c "&#x2019;">
<!ENTITY O "&#x201C;">
<!ENTITY C "&#x201D;">
]>

As far as I know, no other kind of tag is allowed to be nested in this way (I mean inside the tag name), i.e. incl. xml instructions and commenting tags.

My question is, what can you say about this issue. It looks so much against common sense to me. And of course, it's perfectly valid as far as XML is concerned.

Thanks!

+3  A: 

A Doctype Declaration isn't a tag, so it doesn't follow the syntax rules for tags. It is well specified though. (The trick to building a parser is to start with the specification and not from example documents)

David Dorward
Thanks! Sure, the spec is what matters, but still it's pretty disturbing. Instead of being simpler, now I've got that on my mind, too. Of course, w3c must have had some real need to introduce this kind of peculiarity. Off I go to the spec.
Albus Dumbledore
Everything's perfectly clear now. Thanks. The spec is not as bad as it looked at first sight.
Albus Dumbledore
+1  A: 

Wow, that's something you don't see every day. Code like that traces its roots all the way back to SGML. SGML had all sorts of fun bits that we use today: <![CDATA[, <?xml version="1.0"?>, and even PHP's open/close "tags": <?php ... ?>.

All in all, it's nothing to worry about. It's just a series of instructions for the parser, though you might be better off accomplishing the same thing via another means. Keep calm and carry on.

mattbasta
Thanks! Well, it certainly needs some thinking over, like counting the number of opened brackets, etc.
Albus Dumbledore