views:

370

answers:

3
+1  A: 

Does this style help?

<name>
   <![CDATA[
     =b?olu
   ]]>
</name>

Either that or encoding should do the trick.

EDIT: Found this page: http://www.eggheadcafe.com/articles/system.xml.xmlserialization.asp. Specifically, this code for deserialization:

public Object DeserializeObject(String pXmlizedString)
 {
     XmlSerializer xs = new XmlSerializer(typeof(Automobile));
     MemoryStream memoryStream = new MemoryStream(StringToUTF8ByteArray(pXmlizedString));
     XmlTextWriter xmlTextWriter = new XmlTextWriter(memoryStream, Encoding.UTF8);
     return xs.Deserialize(memoryStream);
  }

That part about "StringToUTF8ByteArray" and "Encoding.UTF8" look strangely absent from yours. I'm guessing .NET doesn't like reading the encoding of your actual XML file...?

Glenn
Thanks Glenn, the issue is the XML file is my application input, I cannot change it in any way. I need to find a way to filter out invalid characters and continue to parse (deserialize) remaining ones. If there are some ways to accept such characters, it will be greater!
George2
Sounds like you need a either a SAX parser (http://stackoverflow.com/questions/127869/sax-vs-xmltextreader-sax-in-c), or you need to pre-process the XML yourself and strip/encode problem characters with regex or similar. You might have to dig around for a regex example. I'm not familiar enough with it to give one here.
Glenn
Oh right, even with a SAX parser, you still need to sanitize characters. So you might have to overload it.
Glenn
Catch InvalidOperationException during XML serialization to check whether XML file is valid or not is a good solution? Or not a good solution?
George2
Actually, when open this XML file in VSTS, there is error like this, Error 1 Character '?', hexadecimal value 0xffff is illegal in XML documents. I am confused since in the binary form, there is no 0xffff values.
George2
Catching exceptions isn't a good solution because it won't allow you to continue parsing. Your XML *is* invalid. So you need to pre-process it somehow. Which is more difficult? Loading the file as text, pre-processing, then loading XML, or changing the original source so that it generates valid XML?
Glenn
Hi Glenn, I did some research and find 0x EF BF BF is valid UTF-8 encoding for character 0xFFFF, why the XML deserializer thinks it is invalid?
George2
I'm just speculating here that it is invalid because XML is a text format, and you require a binary JPG to show us the correct view of the data *and* your parsing is failing. In doing so I found a page that might help. Added to my answer.
Glenn
+1  A: 

Have you tried the DataContractSerializer instead? I've encountered an interesting situation, when someone copy and pasted some word or excel stuff into my web application: the string contained some invalid control characters (such as vertical tab). To my surprise this was serialized when sending it to a WCF service and even read back 100% original when requesting it. The pure .net environment did not have a problem with this, so I assume that the DataContractSerializer can handle such stuff (which is IMHO a violation of XML spec, however).

We had another Java client accessing the same service - it failed when receiving this record...

[Edit after ugly format in my comment below]

Try this:

DataContractSerializer serializer = new DataContractSerializer(typeof(MyType));
using (XmlWriter xmlWriter = new XmlTextWriter(filePath, Encoding.UTF8)) 
{ 
  serializer.WriteObject(xmlWriter, instanceOfMyType);
}
using (XmlReader xmlReader = new XmlTextReader(filePath))
{
  MyType = serializer.ReadObject(xmlReader) as MyType;
}

The comment of the second Marc is about DataContractSerializers habit to make XmlElements instead of XmlAttributes:

<AnElement>value</AnElement>

instead of

<AnElement AnAttribute="value" />
Marc Wittke
But I am not using WCF, can I use DataContractSerializer?
George2
Sure you can, just read the documentation. It is very easy!
Dabblernl
As long as the data doesn't involve attributes...
Marc Gravell
Marc, what do you mean "data doesn't involve attributes"? Could you show a sample here?
George2
Hi Dabblernl, you mentioned -- "just read the documentation", but I did not find anything about URL links or document titles you mentioned, appreciate if you could recommend me a document to read.
George2
Try this:DataContractSerializer serializer = new DataContractSerializer(typeof(MyType));using (XmlWriter xmlWriter = new XmlTextWriter(filePath, Encoding.UTF8)){ serializer.WriteObject(xmlWriter, instanceOfMyType);}using (XmlReader xmlReader = new XmlTextReader(filePath)){ MyType = serializer.ReadObject(xmlReader) as MyType;}The comment of the second Marc is about DataContractSerializers habit to make XmlElements instead of XmlAttributes (<AnElement>value</AnElement> instead of <AnElement AnAttribute="value" />)
Marc Wittke
WTF - note to myself: no code in comments. I'll post a new answer, wait...
Marc Wittke
A: 

The "invalid characters" look like they might be intended to be encoded Unicode characters. Perhaps they wrong encoding is being used?

Can you ask the originators of this document what character they meant to include at that location? Perhaps ask them how they generated the document?

John Saunders