views:

87

answers:

4

Hi,

I'm trying to insert into XML column (SQL SERVER 2008 R2), but the server's complaining:

System.Data.SqlClient.SqlException (0x80131904): 
XML parsing: line 1, character 39, unable to switch the encoding

I found out that the XML column has to be UTF-16 in order for the insert to succeed.

The code I'm using is:

 XmlSerializer serializer = new XmlSerializer(typeof(MyMessage));
 StringWriter str = new StringWriter();
 serializer.Serialize(str, message);
 string messageToLog = str.ToString();

How can I serialize object to be in UTF-8 string?

EDIT: Ok, sorry for the mixup - the string needs to be in UTF-8. You were right - it's UTF-16 by default, and if I try to insert in UTF-8 it passes. So the question is how to serialize into UTF-8.

Example This causes errors while trying to insert into Sql:

    <?xml version="1.0" encoding="utf-16"?>
    <MyMessage>Teno</MyMessage>

This doesn't:

    <?xml version="1.0" encoding="utf-8"?>
    <MyMessage>Teno</MyMessage>

Update

I figured out when the SqlServer2008 for it's Xml column type needs utf-8, and when utf-16 in encoding property of the xml specification you're trying to insert:

When you want to add utf-8, then add parameters to sql command like this:

 sqlcmd.Parameters.Add("ParamName", SqlDbType.VarChar).Value = xmlValueToAdd;

If you try to add the xmlValueToAdd with encoding=utf-16 in the previous row it would produce errors in insert. Also, the VarChar means that national characters aren't recognized (they turn out as question marks).

To add utf-16 to db, either use SqlDbType.NVarChar or SqlDbType.Xml in previous example, or just don't specify type at all:

 sqlcmd.Parameters.Add(new SqlParameter("ParamName", xmlValueToAdd));
+2  A: 

A string is always UTF-16 in .NET, so as long as you stay inside your managed app you don't have to care about which encoding it is.

The problem is more likely where you talk to the SQL server. Your question doesn't show that code so it's hard to pin point the exact error. My suggestion is you check if there's a property or attribute you can set on that code that specifies the encoding of the data sent to the server.

Isak Savo
You we're right - it seems that Sql was configured to accept only UTF-8 in xml columns. +1
veljkoz
A: 

Default encoding for a xml serializer should be UTF-16. Just to make sure you can try -

XmlSerializer serializer = new XmlSerializer(typeof(YourObject));

// create a MemoryStream here, we are just working
// exclusively in memory
System.IO.Stream stream = new System.IO.MemoryStream();

// The XmlTextWriter takes a stream and encoding
// as one of its constructors
System.Xml.XmlTextWriter xtWriter = new System.Xml.XmlTextWriter(stream, Encoding.UTF16);

serializer.Serialize(xtWriter, yourObjectInstance);

xtWriter.Flush();
Vinay B R
+1  A: 

You are serializing to a string rather than a byte array so, at this point, any encoding hasn't happened yet.

What does the start of "messageToLog" look like? Is the XML specifying an encoding (e.g. utf-8) which subsequently turns out to be wrong?

Edit

Based on your further info it sounds like the string is automatically converted to utf-8 when it is passed to the database, but the database chokes because the XML declaration says it is utf-16.

In which case, you don't need to serialize to utf-8. You need to serialize with the "encoding=" omitted from the XML. The XmlFragmentWriter (not a standard part of .Net, Google it) lets you do this.

arx
+1  A: 

Although a .net string is always UTF-16 you need to serialize the object using UTF-16 encoding. That sould be something like this:

public static string ToString(object source, Type type, Encoding encoding)
        {
            // The string to hold the object content
            String content;

            // Create a memoryStream into which the data can be written and readed
            using (var stream = new MemoryStream())
            {
                // Create the xml serializer, the serializer needs to know the type
                // of the object that will be serialized
                var xmlSerializer = new XmlSerializer(type);

                // Create a XmlTextWriter to write the xml object source, we are going
                // to define the encoding in the constructor
                using (var writer = new XmlTextWriter(stream, encoding))
                {
                    // Save the state of the object into the stream
                    xmlSerializer.Serialize(writer, source);
                    // Flush the stream
                    writer.Flush();

                    // Read the stream into a string
                    using (var reader = new StreamReader(stream, encoding))
                    {
                        // Set the stream position to the begin
                        stream.Position = 0;
                        // Read the stream into a string
                        content = reader.ReadToEnd();
                    }
                }
            }
            // Return the xml string with the object content
            return content;
        }

By setting the encoding to Encoding.Unicode not only the string will be UTF-16 but you should also get the xml string as UTF-16.

<?xml version="1.0" encoding="utf-16"?>...
Pedro
This is it. It's the most flexible one
veljkoz
Hmm, correct me if I'm wrong here, but all this code is doing is setting `encoding="utf-16"` in the top of the XML data. The `content` string is UTF-16 regardless of what encoding you use for your `XmlTextWriter`.
Isak Savo
Yes precisely. It's not a question if the string is UTF-8 or UTF-16, as you said previously it's always UTF-16. The question is to set the encoding="utf-16" or "utf-8".
Pedro