tags:

views:

108

answers:

6

I have a program that is generating Xml Files from data out of a database. In short code it does the following:

string dsn = "a db connection string";
XmlDocument d = new XmlDocument();
using (SqlConnection con = new SqlConnection(dsn)) {
    con.Open();
    string sql = "select id as Id, comment as Comment from Test where ... ";
    using (SqlCommand cmd = new SqlCommand(sql, con)) {
        DataSet ds = new DataSet("EXPORT");
        SqlDataAdapter da = new SqlDataAdapter(cmd);
        da.Fill(ds, "Test");
        d.LoadXml(ds.GetXml());
    }
}
d.Save(@"c:\test.xml");

When I have a look at the xml file it contains the invalid character & # x 1 A ;

<EXPORT>
  <Test>
    <Id>2</Id>
    <Comment> Keyboard NB&#x1A;5 linked</Comment>
  </Test>
</EXPORT>

This xml file cannot be opened by firefox browser saying invalid character ...

That Entity is reserved in ISO 8859-1 and CP1252 and should not be rendered by browsers. But why does XmlDocument output xml that cannot be parsed as valid - or is it a valid xml document that just cannot be parsed by Browsers or imported by Excel and so on ... Is there a easy way of getting rid of that reserved 'invalid characters' or encoding them in a way that Browsers do not have a Problem with it?

Many thanks for your opinion and tipps

A: 

Have a look at this answer to see if it helps:

http://stackoverflow.com/questions/1876022/net-dataset-getxml-whats-the-default-encoding

gt
Thanks for your tipp, but the string I get with ds.GetXml() is internally in unicode (utf-16) I think and when writing to textfile without change of encoding everything should be ok?
Tobias Pirzer
A: 

I'd think you're processing a Control-Z (end of text file) character. Is this possible?

Hm google said "reserved unused" for iso8859-1 and its superset cp1252 maybe it is end of file mark, ... but content in DB is black box for me, so I have no possibility to filter for clean input into the DB tables ...
Tobias Pirzer
A: 

I've run into this a few times when creating/manipulating XML from SQL data.

But why does XmlDocument output xml that cannot be parsed as valid - or is it a valid xml document that just cannot be parsed by Browsers or imported by Excel and so on

The XmlDocument doesn't perform any validation on the data that you send it, it leaves that to you (the developer). This XML document should be invalid in almost every thing that uses XML (but I could be wrong about that ... you could always test it :P)

Almost every time I've hit this problem, I ended up using replacing the offending XML data with either the proper character (if it has one) or just getting rid of it.

You could also try putting your xml inside a CData block, but that will bloat the file a tiny bit (not sure how big overall your file will be)

Tony Abrams
+1  A: 

Take a look to this http://stackoverflow.com/questions/3136954/xml-parse-error-on-illegal-character

Conclusion (as I understood it): With XML 1.0 it is impossible to store this value.

ckuetbach
A: 

Make sure to escape XML entities, eg. & => &amp; Otherwise, wrap the data in CDATA http://en.wikipedia.org/wiki/CDATA

Allen Hamilton
+1  A: 

Not all characters are representable in XML.

In XML 1.0, none of the characters with values less than 0x20 can be used, except for TAB (0x09), LF (0x0A) and CR (0x0D).

In XML 1.1, just about anything except NUL (0x00) can be used.

If you have the option to use XML 1.1, and the receiving program supports XML 1.1 (not many do), then you can escape the 0x1A as &#26; or &#x1A;.

Wrapping it in CDATA is not a solution either; CDATA is just a convenience for escaping groups of characters differently than the standard &-mechanism.

Otherwise, you will need to remove it prior to serializing.

lavinio
sorry for answering quite late to that old question - I removed the characters before serializing ... Thx
Tobias Pirzer