views:

651

answers:

4

I can't believe I can't find this information easily accessible, so:

1) Which characters cannot be incorporated in an XML attribute without entity-encoding them?

Obviously, you need to encode quotes. What about < and >? What else?

2) Where exactly is the official list?

+1  A: 

See 2.2 Characters in "Extensible Markup Language (XML) 1.0 (Third Edition)".

Note that, at least with .NET, if you are using the XML APIs to work with XML, then you won't have to worry about this. It's the reason not to treat XML as being text.

John Saunders
I agree on the document location, but I don't think that that specific section is the correct place to look at. That section lists the valid characters allowed in the "text stream", if you will. About .NET and libraries, I couldn't agree more -- but in this particular case I need to edit an existing text file that contains XML.
Euro Micelli
So, why not use the XML APIs to process that text file?
John Saunders
A: 

Here's another list to Escape characters in XML.

Vadim
+2  A: 

As per the (2) current recommendation, specifically regarding character data and Markup, they are (1) the ampersand (&), left angle bracket (<), right angle bracket (>) and both single-quote (') and double-quote (").

codehead
I agree on the section of the spec document. However, not all of those attributes "must" be escaped. Can you edit to clarify?
Euro Micelli
+4  A: 

Here is the definition of what is allowed in an attribute value.

'"' ([^<&"] | Reference)* '"' | "'" ([^<&'] | Reference)* "'"

So, you can't have the same character that opens/closes the attribute value, and you also cannot have a naked ampersand, and definitely no left brackets.

You should also not being using any characters that are outright not legal anywhere in an XML document (such as form feeds, etc).

great_llama