tags:

views:

729

answers:

5

Is there any classes to convert ascii to xml characterset preferably opensource i will be using this class either in vc++ or C#

My ascii has some printable characters which is not there in xml character set

i just tried to sen a resume which is in ascii character set and i tried to store it in a online crm and i got this error message

javax.xml.bind.UnmarshalException - with linked exception: Message: Character reference "&#x13" is an invalid XML character.]

Thanks in advance

+1  A: 

Possibly you don't fully understand what a character set is. XML is not a character set, though XML based output does use character sets to encode data.

I'd recommend reading through Joel Spolsky's excellent post The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!), then come back and have another go at your question.

Brian
+1  A: 

The character reference &#x13 is indeed not a valid XML character. You probably want either &#xD or &#13.

Greg Hewgill
That's what I got in my trial (refer to my post on this page). I've got `` `` and so on.... I didn't even get `` and `` they became new line and carriage return.
o.k.w
+2  A: 

Your text won't have any printable characters which aren't available in XML - but it may have some unprintable characters which aren't available in XML.

In particular, Unicode values U+0000 to U+001F are invalid except for tab. carriage return and line feed. If you really need those other control characters, you'll have to create your own form of escaping for them, and unescape them at the other end.

Jon Skeet
A: 

Out of curiousity, I took a few minutes to write a simple routinein C# to pump out a XML string of the 128 ASCII characters, to my surprise, .NET didn't output a really valid XML document. I guess the way I output the element text wasn't quite right. Anyway here is the code (comments are welcomed):

XmlDocument doc = new XmlDocument();
doc.AppendChild(doc.CreateXmlDeclaration("1.0", "us-ascii", ""));
XmlElement elem = doc.CreateElement("ASCII");
doc.AppendChild(elem);
byte[] b = new byte[1];
for (int i = 0; i < 128; i++)
{
    b[0] = Convert.ToByte(i);
    XmlElement e = doc.CreateElement("ASCII_" + i.ToString().PadLeft(3,'0'));
    e.InnerText = System.Text.ASCIIEncoding.ASCII.GetString(b);
    elem.AppendChild(e);
}
Console.WriteLine(doc.OuterXml);

Here is the formatted output:

<?xml version="1.0" encoding="us-ascii" ?>
    <ASCII>
    <ASCII_000>&#x0;</ASCII_000>
    <ASCII_001>&#x1;</ASCII_001>
    <ASCII_002>&#x2;</ASCII_002>
    <ASCII_003>&#x3;</ASCII_003>
    <ASCII_004>&#x4;</ASCII_004>
    <ASCII_005>&#x5;</ASCII_005>
    <ASCII_006>&#x6;</ASCII_006>
    <ASCII_007>&#x7;</ASCII_007>
    <ASCII_008>&#x8;</ASCII_008>
    <ASCII_009> </ASCII_009>
    <ASCII_010>
    </ASCII_010>
    <ASCII_011>&#xB;</ASCII_011>
    <ASCII_012>&#xC;</ASCII_012>
    <ASCII_013>
    </ASCII_013>
    <ASCII_014>&#xE;</ASCII_014>
    <ASCII_015>&#xF;</ASCII_015>
    <ASCII_016>&#x10;</ASCII_016>
    <ASCII_017>&#x11;</ASCII_017>
    <ASCII_018>&#x12;</ASCII_018>
    <ASCII_019>&#x13;</ASCII_019>
    <ASCII_020>&#x14;</ASCII_020>
    <ASCII_021>&#x15;</ASCII_021>
    <ASCII_022>&#x16;</ASCII_022>
    <ASCII_023>&#x17;</ASCII_023>
    <ASCII_024>&#x18;</ASCII_024>
    <ASCII_025>&#x19;</ASCII_025>
    <ASCII_026>&#x1A;</ASCII_026>
    <ASCII_027>&#x1B;</ASCII_027>
    <ASCII_028>&#x1C;</ASCII_028>
    <ASCII_029>&#x1D;</ASCII_029>
    <ASCII_030>&#x1E;</ASCII_030>
    <ASCII_031>&#x1F;</ASCII_031>
    <ASCII_032> </ASCII_032>
    <ASCII_033>!</ASCII_033>
    <ASCII_034>"</ASCII_034>
    <ASCII_035>#</ASCII_035>
    <ASCII_036>$</ASCII_036>
    <ASCII_037>%</ASCII_037>
    <ASCII_038>&amp;</ASCII_038>
    <ASCII_039>'</ASCII_039>
    <ASCII_040>(</ASCII_040>
    <ASCII_041>)</ASCII_041>
    <ASCII_042>*</ASCII_042>
    <ASCII_043>+</ASCII_043>
    <ASCII_044>,</ASCII_044>
    <ASCII_045>-</ASCII_045>
    <ASCII_046>.</ASCII_046>
    <ASCII_047>/</ASCII_047>
    <ASCII_048>0</ASCII_048>
    <ASCII_049>1</ASCII_049>
    <ASCII_050>2</ASCII_050>
    <ASCII_051>3</ASCII_051>
    <ASCII_052>4</ASCII_052>
    <ASCII_053>5</ASCII_053>
    <ASCII_054>6</ASCII_054>
    <ASCII_055>7</ASCII_055>
    <ASCII_056>8</ASCII_056>
    <ASCII_057>9</ASCII_057>
    <ASCII_058>:</ASCII_058>
    <ASCII_059>;</ASCII_059>
    <ASCII_060>&lt;</ASCII_060>
    <ASCII_061>=</ASCII_061>
    <ASCII_062>&gt;</ASCII_062>
    <ASCII_063>?</ASCII_063>
    <ASCII_064>@</ASCII_064>
    <ASCII_065>A</ASCII_065>
    <ASCII_066>B</ASCII_066>
    <ASCII_067>C</ASCII_067>
    <ASCII_068>D</ASCII_068>
    <ASCII_069>E</ASCII_069>
    <ASCII_070>F</ASCII_070>
    <ASCII_071>G</ASCII_071>
    <ASCII_072>H</ASCII_072>
    <ASCII_073>I</ASCII_073>
    <ASCII_074>J</ASCII_074>
    <ASCII_075>K</ASCII_075>
    <ASCII_076>L</ASCII_076>
    <ASCII_077>M</ASCII_077>
    <ASCII_078>N</ASCII_078>
    <ASCII_079>O</ASCII_079>
    <ASCII_080>P</ASCII_080>
    <ASCII_081>Q</ASCII_081>
    <ASCII_082>R</ASCII_082>
    <ASCII_083>S</ASCII_083>
    <ASCII_084>T</ASCII_084>
    <ASCII_085>U</ASCII_085>
    <ASCII_086>V</ASCII_086>
    <ASCII_087>W</ASCII_087>
    <ASCII_088>X</ASCII_088>
    <ASCII_089>Y</ASCII_089>
    <ASCII_090>Z</ASCII_090>
    <ASCII_091>[</ASCII_091>
    <ASCII_092>\</ASCII_092>
    <ASCII_093>]</ASCII_093>
    <ASCII_094>^</ASCII_094>
    <ASCII_095>_</ASCII_095>
    <ASCII_096>`</ASCII_096>
    <ASCII_097>a</ASCII_097>
    <ASCII_098>b</ASCII_098>
    <ASCII_099>c</ASCII_099>
    <ASCII_100>d</ASCII_100>
    <ASCII_101>e</ASCII_101>
    <ASCII_102>f</ASCII_102>
    <ASCII_103>g</ASCII_103>
    <ASCII_104>h</ASCII_104>
    <ASCII_105>i</ASCII_105>
    <ASCII_106>j</ASCII_106>
    <ASCII_107>k</ASCII_107>
    <ASCII_108>l</ASCII_108>
    <ASCII_109>m</ASCII_109>
    <ASCII_110>n</ASCII_110>
    <ASCII_111>o</ASCII_111>
    <ASCII_112>p</ASCII_112>
    <ASCII_113>q</ASCII_113>
    <ASCII_114>r</ASCII_114>
    <ASCII_115>s</ASCII_115>
    <ASCII_116>t</ASCII_116>
    <ASCII_117>u</ASCII_117>
    <ASCII_118>v</ASCII_118>
    <ASCII_119>w</ASCII_119>
    <ASCII_120>x</ASCII_120>
    <ASCII_121>y</ASCII_121>
    <ASCII_122>z</ASCII_122>
    <ASCII_123>{</ASCII_123>
    <ASCII_124>|</ASCII_124>
    <ASCII_125>}</ASCII_125>
    <ASCII_126>~</ASCII_126>
    <ASCII_127></ASCII_127>
</ASCII>

Update:
Added XML decalration with "us-ascii" encoding

o.k.w
A: 

You won't need an additional library to do that. From different encodings to embedded binary data, all of that is possible through the common .net library. Can you just give a simple example?

Alex