views:

178

answers:

3

I am getting the following exception while trying to save some Tweets,

Caused by: java.sql.SQLException: Incorrect string value: '\xF3\xBE\x8D\x81' for column 'twtText' at row 1 at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1055) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:956) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3491) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3423) at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1936) at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2060) at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2542) at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1734) at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2019) at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:1937) at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:1922) at org.hibernate.id.IdentityGenerator$GetGeneratedKeysDelegate.executeAndExtract(IdentityGenerator.java:94) at org.hibernate.id.insert.AbstractReturningDelegate.performInsert(AbstractReturningDelegate.java:57)

My table structure is given below, all the columns are in UTF-8 format,

 CREATE TABLE `tblkeywordtracking` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `word` varchar(200) NOT NULL,
  `tweetId` bigint(100) NOT NULL,
  `twtText` varchar(800) DEFAULT NULL,
  `negTwtText` varchar(1000) DEFAULT NULL,
  `language` text,
  `links` text,
  `negWt` double DEFAULT NULL,
  `posWt` double DEFAULT NULL,
  `tweetType` varchar(20) DEFAULT NULL,
  `source` text,
  `sourceStripped` text,
  `isTruncated` varchar(40) CHARACTER SET latin1 DEFAULT NULL,
  `inReplyToStatusId` bigint(30) DEFAULT NULL,
  `inReplyToUserId` int(11) DEFAULT NULL,
  `isFavorited` varchar(40) CHARACTER SET latin1 DEFAULT NULL,
  `inReplyToScreenName` varchar(40) DEFAULT NULL,
  `latitude` bigint(100) NOT NULL,
  `longitude` bigint(100) NOT NULL,
  `retweetedStatus` varchar(40) CHARACTER SET latin1 DEFAULT NULL,
  `statusInReplyToStatusId` bigint(100) NOT NULL,
  `statusInReplyToUserId` bigint(100) NOT NULL,
  `statusFavorited` varchar(40) CHARACTER SET latin1 DEFAULT NULL,
  `statusInReplyToScreenName` text,
  `screenName` text,
  `profilePicUrl` text,
  `twitterId` bigint(100) NOT NULL,
  `name` text,
  `location` text,
  `bio` text,
  `utcOffset` int(11) DEFAULT NULL,
  `timeZone` varchar(100) DEFAULT NULL,
  `frenCnt` bigint(20) DEFAULT '0',
  `createdAt` datetime DEFAULT NULL,
  `createdOnGMT` text CHARACTER SET latin1,
  `createdOnServerTime` datetime DEFAULT NULL,
  `follCnt` bigint(20) DEFAULT '0',
  `favCnt` bigint(20) DEFAULT '0',
  `totStatusCnt` bigint(20) DEFAULT NULL,
  `usrCrtDate` varchar(200) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `id` (`id`,`word`),
  KEY `twtText` (`twtText`(333)),
  KEY `word` (`word`,`tweetType`),
  KEY `posWt` (`posWt`,`negWt`)
) ENGINE=MyISAM AUTO_INCREMENT=1740 DEFAULT CHARSET=utf8;
+1  A: 

You must add a character set and a collation to column twtText. So your column should look like this:

twtText varchar(800) character set utf8 collate utf8_polish_ci DEFAULT NULL,

Change utf8_polish_ci with what collation you want.

Run the following query to see the available collations:

SHOW COLLATION;
True Soft
+1  A: 

It looks like a valid utf-8 sequence, that encodes the following character U+FE341.

As you can see, this is a Unicode character that uses more than 2 bytes. From this and this I deduce that MySQL still doesn't support this subset of Unicode characters (at least for versions < 5.5).

adamk
3 byte UTF8 sequences == 2 byte Unicode characters. This is a 4 byte UTF8 sequence, which encodes 20-bit Unicode characters.
adamk
You are right, I missed the extra bits of the UTF8-encoding. 0xF3BE8D81 is11110 01110 11111010 00110110 000001which is Unicode 011111110001101000001 = 0xFE341.
Michael Konietzka
A: 

MySQL 5.0/5.1 does not support 4byte UTF8-characters, this is a known limitation. MySQL 5.5 does support 4byte UTF8-characters.

See 9.1.10. Unicode Support

Michael Konietzka