tags:

views:

143

answers:

3

I have been searching trying to confirm my reading of the XML spec. My interpretation is that pre-defined entities and numeric character references are not allowed in tag names and attribute names, for example this is not allowed by the XML 1.0 spec.:

<root>
<test&apos;&#x27;&#39;tag test&apos;&#x27;&#39;attribute="one"/>
</root>

However, I have one parser that returns test'''tag for the tag name and test'''attribute for the attribute name while another parser returns test&apos;&#x27;&#39;tag for the tag name and test&apos;&#x27;&#39;attribute for the attribute name.

Which parser is correct? Or are they both wrong (i.e. they should throw a well formed error)?

Thanks!

A: 

In digging around at w3.org, I found the following relevant pieces:

[41] Attribute ::= Name Eq AttValue [VC: Attribute Value Type] [WFC: No External Entity References] [WFC: No < in Attribute Values]

[WFC: No External Entity References] links to:

Well-formedness constraint: No External Entity References
Attribute values MUST NOT contain direct or indirect entity references to external entities.

Name links to:

[5] Name ::= NameStartChar (NameChar)*

[4] NameStartChar ::= ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]

[4a] NameChar ::= NameStartChar | "-" | "." | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]

Yes, it's as clear as mud! My interpretation of this would be that you could use hex entity references as long as they fell in the ranges specified above but that you could not use pre-defined references.

I would expect a well-formed error when the names don't conform to this.

17 of 26
ScottProuty
I certainly have never seen them used in such a way and personally would avoid it.
17 of 26
A: 

It seems to me that they are both wrong. According to the spec, only the following characters should be in a start tag:

":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF] | "-" | "." | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]

XMLSpy certainly isn't happy with it either. Nor <Oxygen/>.

And...just for good measure...here's what .NET had to say about it:

The '&' character, hexadecimal value 0x26, cannot be included in a name. Line 1, position 12.

What parsers are you using?

dommer
Thanks, dommer!I agree it seems to be a well formedness error. There are other places in the spec. where PEReferences and References are explicity shown to be allowed. But they are not explicitly shown for tag names or attribute names.The parsers I am testing are internally generated.
ScottProuty
A: 

This is very simple: no entities can be used within names. Both "parsers" are wrong here. XML specification quite clearly defines this -- there are no hidden default rules; if some construct is not included, it is not allowed.

Entities can only be used within regular character content and attribute values. And they can be included in some other places (comments, processing instructions, DTD subsets) but won't be expanded (i.e. are not recognized as entities).

StaxMan