views:

340

answers:

2

I have legacy code (I didn't write it) that always included the encoding attribute, but recompiling it to D2010, TXMLDocument doesn't include the enconding anymore. Because the XML data have accented characters both on tags and data, TXMLDocument.LoadFromFile simply throws EDOMParseErros saying that an invalid character is found on the file. Relevant code:

   Doc := TXMLDocument.Create(nil);  
   try
     Doc.Active := True;
     Doc.Encoding := XMLEncoding;
     RootNode := Doc.CreateElement('Test', '');
     Doc.DocumentElement := RootNode;
     <snip>
     //Result := Doc.XMl.Text;
     Doc.SaveToXML(Result);    // Both lines gives the same result

On older versions of Delphi, the following line is generated:

<?xml version="1.0" encoding="ISO-8859-1"?>

On D2010, this is generated:

<?xml version="1.0"?>

If I change manually the line, all works like always worked in the last years.

UPDATE: XMLEncoding is a constant and is defined as follow

  XMLEncoding = 'ISO-8859-1';
+3  A: 

You'll want to see IXMLDocument.CreateProcessingStruction. I use OmniXML, but it's syntax is similar and should get you started:

var
  FDoc: IXMLDocument;
  PI:  IXMLProcessingInstruction;
begin
  FDoc := OmniXML.CreateXMLDoc();
  PI := FDoc.CreateProcessingInstruction('xml', 'version="1.0" encoding="UTF-8"');
  FDoc.AppendChild(PI);
end;
Ken White
That's exactly what Microsoft recommends for MSXML, too: http://msdn.microsoft.com/en-us/library/aa468560.aspx. However, the thing at the start of the document isn't technically a processing instruction. It's an *XML declaration*; the string "xml" isn't really allowed for the name of a processing instruction, so it appears the `CreateProcessingInstruction` method is doing double duty.
Rob Kennedy
@Rob: That's probably why it took me a while a couple of years ago to figure it out (didn't have the MSDN link you provided at the time). However, it actually could be considered a processing instruction, couldn't it, if it's telling the parser how to interpret the content? "This is XML, and it's in this character set - that will make it easier to figure out."
Ken White
A: 
var 
  XMLString: TStringStream;
begin  
   Doc := TXMLDocument.Create(nil);  
   try
     Doc.Active := True;
     Doc.Encoding := XMLEncoding;
     RootNode := Doc.CreateElement('Test', '');
     Doc.DocumentElement := RootNode;
     <snip>
     XMLString := TStringStream.Create;
     Doc.SaveToStream(XMLStream);
     Result := XmlStream.DataString;
     XMLStream.Free;

Since Ken's answer and the link to MSXML article, I decided to investigate the XML property and SaveToXML method. Both use the XML property of the MSXMLDOM implementation - which in the article is said that do not bring the encoding when directly read ( in the "Creating New XML Documents with MSXML" section right after the use of CreateProcessInstruction method).

UPDATE:

I found that accented characters are getting truncated in the resulting XML. When the processor of that XML started to throw strange errors, we saw that the chars are being converted to the numeric char constant ( #13 is the numeric char constant for carriage return). So, I used a TStringStream to get it FINALLY right.

Fabricio Araujo