I am doing some work for a French client and so need to deal with accented characters. But I'm running into a lot of difficulty, I am hoping the solution is simple and that somebody can point it out to me.
The string: La Forêt pour Témoin
is converted to: La For? pour T?oin
Note the missing character following the accented character - the t following the ê and the m following the é.
I have tried using StringEscapeUtils which was successful at escaping some characters, such as ă. I have also built my own escape function which produces the same results (ă will work, ê will not).
private String escapeChars(String string) {
char[] chars = string.toCharArray();
String result = "";
for (int i = 0; i < chars.length; i++) {
int c = chars[i];
result += "&#" + c + ";";
}
return result;
}
The project is running in eclipse using the App Engine plugin, I cannot narrow down whether the problem is caused by Java, App Engine, or SQLite.
Any help is appreciated.
EDIT: I have found that string are malformed when simply displaying the the request parameter from a form. (ie, request.getParameter("string") already has malformed content).
I have tried the meta-tag suggested by Daniel with no success. I think you are on the right track though, the header data of html document follows:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
When accented characters are hard-coded into a JSP they are displayed as intended.
EDIT: I have also added <?xml version="1.0" encoding="UTF-8"?>
to the very start of the page.
I am very close to a solution. I have found that if I change the encoding of the page from within the browser form data is passed to the server properly. I cannot figure out how to make the browser auto detect page encoding.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
RESOLVED: I couldn't work out how to make the browser auto-detect UTF-8 encoding which java defaults to. So I have forced character encoding to ISO-8859-1 using request.setCharacterEncoding("ISO-8859-1").