views:

277

answers:

3

I have an XML file that contains a message with html tags in it. The XML file is read by a java class that mails it to people. When the mail is received, the accents do not show. For example é doesn't show.

I have tried é in the xml but it gives an error in eclipse saying that the entity has not been declared.

I also tried simply inserting é but that shows nothing in the final output.

The 3rd thing I tried was using <![CDATA[é]]> but that broke the parser since it didn't output anything after it.

However I noticed something weird. When i put something like this in the xml and added UTF-16 encoding

<message>text bla bla blaa é&lt; 

it did ouput the é at the end like this bla bla blaa blaa é.

EDIT <message>text bla bla blaa éé&lt; outputs ?é or just one é

The file looks something like this:

<?xml version="1.0"? encoding="UTF-16">

<message>
&lt;b&gt;hello é &lt;/b&gt;
</message>
</xml>

What gives?

A: 
ataylor
+2  A: 

Did you try,change the encoding to UTF-8?

Wiliam Witter
I tried initially with xml file being in UTF-8 and switched it to UTF-16
Ayrad
+1  A: 

The encoding key that you provide in the tag MUST be consistent with the "real" encoding which has been used to edit and save the xml file on your harddrive.

If you edited your xml file in some european country under windows with notepad, it will surely be encoded in cp1252 (the default encoding used by windows in such situation, noting that cp1252 is a slight variant of normalized ISO8859-1 to include the euro sign).

In fact I would suggest to use an editing tool which allows you to control accurately which encoding to be used during edit/save operations (like http://jedit.org) so you can guarantee that the effective file encoding and the given encoding in its content (so to say in tag) are the same.

EDIT
It also depends greatly on the way your java program reads the xml file and uses it. If an xml parser is used, it should be ok. Otherwise you'll probably have to use ISO-8859-1 encoding to store the file as it is the default read encoding used by java. If you're very unlucky and another encoding is used for the file reading process in the java class, well you'll have to comply to that...

EDIT 2
It also depends on the mail client and the way it manages encoding...

zim2001
+1 stop trying to find workarounds for your encoding issues, fix them!
Michael Borgwardt
I did a message = message.replaceAll("é", "é" ); during right when the html is generated. It seems to work but feels like a workaround :|
Ayrad