views:

332

answers:

2

I read texts from different sources which can have characters from different languages/extended characters like € ƒ „ … † ® ©. And then I am supposed to write to an XML file, I am using PrinterWriter in java to write to an XML file whatever string I read. So for these types of extended characters which has ascii greater than 127 gives illegal characters error in XML file, so how can I encode it properly while writing to XML.

A: 

Use the &#value; syntax. Space would be  

Thorbjørn Ravn Andersen
+1  A: 

First, there's no such thing as an ASCII code above 127. ASCII only defines values up to 127. "Extended ASCII" is an ambiguous term, as it's used to describe many different encodings.

Now, as for XML: use whichever XML API you want to write the string, without worrying about the contents (so long as they are representable in XML; various control characters in the range U+0000 to U+001F aren't representable, unfortunately). Don't try to create the XML from scratch yourself - that's what XML APIs are for. Make sure that your XML document uses an encoding that will cope with the characters you need (UTF-8 is normally a good choice, and is often the default), make sure that your Java strings have the right Unicode data in them, and you should be fine.

EDIT: I hadn't actually spotted this bit before:

I am using PrinterWriter in java to write to an XML

Don't. Please use an XML API. There are plenty around, and you'll have a lot less to worry about. I'd also not recommend using PrintWriter anyway for the most part - suppressing exceptions isn't really a good idea in most cases.

Jon Skeet
My output XML file is of WordProcessingML format, so can you suggest any open source XML API for that?
Sn
@Sn: Any of them, so long as it's normal XML you want to create. You may find JDOM (http://www.jdom.org/) easier to use than the built-in APIs.
Jon Skeet