views:

252

answers:

3

I need to store (possibly long) text in a MySQL database. The text may contain special characters and non-latin letters and it should be possible to perform full-text-search on it. MySQL 5 can't store such characters (but it will be possible in MySQL 6), so I though about URL-encoding the text before storing it and decoding it after fetching it. Do you think it is a good idea? Did anyone do something like that? Do you have alternative solutions?

A: 

Not a bad idea.

Rather than (only) decoding after fetching, you should also plan for a process to url-encode any search terms prior to building the search query. If your app-side logic has usage functions which filter all data through the encoding/decoding process, then you shouldn't have any misusage slip through the cracks.

Also, is it possible that a VARBINARY data type would dodge the problem? (I could Google this part myself, but it's late and I'm sleepy. Just helping brainstorm, bedtime now.)

DreadPirateShawn
You're right about the need to come up with a way to encode/decode everything, I'll think about a way to implement it.I thought about using VARBINARY and BLOB fields, but it's not possible to do full text search on these types.
Gabriel
+1  A: 

Why not use Unicode, encoded with UTF8 - MySQL 5 supports it

Paul Dixon
Because some of the problematic characters are not UTF-8, but UTF-16, which MySQL 5 doesn't support.When trying to store such characters through JDBC they are stored as '?' and when trying to do it through MySQL Administrator it throws an exception.
Gabriel
Could you not simply transcode from UTF-16 to UTF-8?
Paul Dixon
A: 

MySQL's Unicode full text search is smart enough to search for related characters like "á" and "ä" when searching for "a". So, I'd not store URL encoded text, but use MySQL's options.

Arjan