views:

236

answers:

2

Hi all

I'm trying to write an XML file with UTF-8 encode, and the original string can have invalid characters like 'á', so, i need to change these invalid characters to a valid ones.

I know that there is an encoding method that take, for example, character á and transform it to group of characters á.

I am trying to achive this with C#but i have no succes on it. I am using Encoding.UTF8 functions but i only end with the sema character (i.e: á) or a '?' character.

So, do you know with is the correct way to achive this character change with C# ??

Thanks for your time and help :)

LLORENS

+4  A: 

á is not an "invalid" character. It has a UTF-8 encoding (bytes 195 and 161), and Nick is right that if you construct everything correctly this will be transparent.

Matthew Flaschen
A: 

You can use any one method.

Here are 4 ways you can encode XML in C#:

  1. string.Replace() 5 times

This is ugly but it works. Note that Replace("&", "&") has to be the first replace so we don't replace other already escaped &.

string xml = "<node>it's my \"node\" & i like it<node>";
encodedXml = xml.Replace("&","&amp;").Replace("<","&lt;").Replace(">","&gt;").Replace("\"", "&quot;").Replace("'", "&apos;");

// RESULT: &lt;node&gt;it&apos;s my &quot;node&quot; &amp; i like it&lt;node&gt;
  1. System.Web.HttpUtility.HtmlEncode()

Used for encoding HTML, but HTML is a form of XML so we can use that too. Mostly used in ASP.NET apps. Note that HtmlEncode does NOT encode apostrophes ( ' ).

string xml = "<node>it's my \"node\" & i like it<node>";
string encodedXml = HttpUtility.HtmlEncode(xml);

// RESULT: &lt;node&gt;it's my &quot;node&quot; &amp; i like it&lt;node&gt;
  1. System.Security.SecurityElement.Escape()

In Windows Forms or Console apps I use this method. If nothing else it saves me including the System.Web reference in my projects and it encodes all 5 chars.

string xml = "<node>it's my \"node\" & i like it<node>";
string encodedXml = System.Security.SecurityElement.Escape(xml);

// RESULT: &lt;node&gt;it&apos;s my &quot;node&quot; &amp; i like it&lt;node&gt;
  1. System.Xml.XmlTextWriter

Using XmlTextWriter you don't have to worry about escaping anything since it escapes the chars where needed. For example in the attributes it doesn't escape apostrophes, while in node values it doesn't escape apostrophes and qoutes.

string xml = "<node>it's my \"node\" & i like it<node>";
using (XmlTextWriter xtw = new XmlTextWriter(@"c:\xmlTest.xml", Encoding.Unicode))
{

    xtw.WriteStartElement("xmlEncodeTest");
    xtw.WriteAttributeString("testAttribute", xml);
    xtw.WriteString(xml);
    xtw.WriteEndElement();

}

// RESULT:
/*
<xmlEncodeTest testAttribute="&lt;node&gt;it's my &quot;node&quot; &amp; i like it&lt;node&gt;">
    &lt;node&gt;it's my "node" &amp; i like it&lt;node&gt;
</xmlEncodeTest>
*/

[http://weblogs.sqlteam.com/mladenp/archive/2008/10/21/Different-ways-how-to-escape-an-XML-string-in-C.aspx]

Rohit