views:

192

answers:

2

I'm writing web-app using MySQL version 5.1.45, Tomcat 5.5.28 and Hibernate 3

When I'm trying to save string that contains non-latin characters (for example Упячка) error occurs:

1589 [main] WARN org.hibernate.util.JDBCExceptionReporter - SQL Error: 1366, SQLState: HY000
1589 [main] ERROR org.hibernate.util.JDBCExceptionReporter - Incorrect string value: '\xD0\xA3\xD0\xBF\xD1\x8F...' for column 'name' at row 1

Hibernate connection settings

<property name="connection.driver_class">com.mysql.jdbc.Driver</property>
<property name="connection.url">jdbc:mysql://localhost/E2012?characterEncoding=UTF8&amp;useUnicode=true</property>
<property name="connection.username">***</property>
<property name="connection.password">***</property>
<property name="hibernate.connection.charSet">UTF8</property>

MySQL config My.cnf

[client]
 default-character-set=utf8

[mysqld]
 default-character-set=utf8

Even query set name utf-8 doesn't resolve problem

Thanks for help!

A: 

Try using UTF-8 for the characterEncoding param in your JDBC url, rather than UTF8 (notice the dash).

That happened to me before.

ryanprayogo
In fact, wherever you have `UTF8` in your example, I have it as `UTF-8`
ryanprayogo
utf8 without a dash it is a result of a few hours of experiments: with dash, without dash, in upper or lower case. Unfortunaly all experiments was unsussefull :(
glebreutov
+3  A: 

In UTF-8, Упячка should actually be represented as \x423\x43F\x44F\x447\x43A\x430. The \xD0\xA3\xD0\xBF\xD1\x8F... implies that they are incorrectly been encoded using ISO-8859-1.

Here's a test snippet which proves this:

String s = new String("Упячка".getBytes("UTF-8"), "ISO-8859-1"); // First decode with UTF-8, then (incorrectly) encode with ISO-8859-1.
for (char c : s.toCharArray()) {
    System.out.printf("\\x%X", (int) c);
}

Which prints

\xD0\xA3\xD0\xBF\xD1\x8F\xD1\x87\xD0\xBA\xD0\xB0

So your problem needs to be solved one step before. Since you're talking about a Java webapplication and this string is likely result from user input, are you sure that you have taken care about the HTTP request and response encodings? First, in JSP, you need to add the following to top of the JSP:

<%@ page pageEncoding="UTF-8" %>

This not only renders the page in UTF-8, but it also implicitly sets a HTTP Content-Type response header instructing the client that the page is rendered using UTF-8, so that the client knows that it should display any content and process any forms using the same encoding.

Now, the HTTP request part, for GET requests you need to configure the servletcontainer in question. In Tomcat for example, this is a matter of setting the URIEncoding attribute of the HTTP connector in /conf/server.xml accordingly. For POST requests this should already be taken care by the client (webbrowser) being smart enough to use the response encoding as specified in the JSP. If it doesn't, then you'll need to bring in a Filter which checks and sets the request encoding.

For more background information you may find this article useful.


Apart from this all, MySQL has another problem with Unicode characters. It only supports UTF-8 characters up to 3 bytes, not 4 bytes. In other words, only the BMP range of 65535 characters is supported, outside not. PostgreSQL for example supports it fully. This may not hurt your webapplication, but this is certainly something to keep in mind.

BalusC
I'm using unit test to test encoding. I have tried different file encoding: utf-8 and ISO-8859-1 and even MacCyrillic (I use OS X). All without success.
glebreutov