tags:

views:

138

answers:

1

I have noticed that Google Toolbox for Mac replaces several SQLite built-in functions (LOWER/UPPER, LIKE, GLOB) with its own versions that handle string locales better.

So, question to everyone who has SQLite experience: have you ever had any problems with non-English locales in SQLite? Does one really have to do something to properly handle non-English alphabets? What kinds of problems are expected if I use SQLite APIs as-is?

(I'm going to use SQLite on the iPhone, but I guess it's the same across all platforms. I've been using Core Data previously and never had any problems, but this time I want to switch to non-ORMed db access.)

+1  A: 

It seems that SQLite does not care about locale at all. The only place I found a mentioning about locales is computation of datetime('now'). But the documentation says that its behavior depends on the underlying C functions. It does store text data in unicode by default (in versions above 3.0), but the conversion to unicode is the responsibility of the client libraries.

By the way, SQLite console under MS Windows is one of those rare console applications that still works as expected when you switch the console codepage to utf-8.

UPD:

Some citations from SQLite cocumentation:

one:

When SQLite compares two strings, it uses a collating sequence or collating function (two words for the same thing) to determine which string is greater or if the two strings are equal. SQLite has three built-in collating functions: BINARY, NOCASE, and RTRIM.

  • BINARY - Compares string data using memcmp(), regardless of text encoding.
  • NOCASE - The same as binary, except the 26 upper case characters of ASCII are folded to their lower case equivalents before the comparison is performed. Note that only ASCII characters are case folded. SQLite does not attempt to do full UTF case folding due to the size of the tables required.
  • RTRIM - The same as binary, except that trailing space characters are ignored.

An application can register additional collating functions using the sqlite3_create_collation() interface.

two:

lower(X) The lower(X) function returns a copy of string X with all ASCII characters converted to lower case. The default built-in lower() function works for ASCII characters only. To do case conversions on non-ASCII characters, load the ICU extension.

upper(X) The upper(X) function returns a copy of input string X in which all lower-case ASCII characters are converted to their upper-case equivalent.

three:

SQLITE_ENABLE_ICU This [compilation] option causes the International Components for Unicode or "ICU" extension to SQLite to be added to the build.


Seems like in either way you will need to compile the sqlite yourself.

newtover
Thanks. But my question still stands: given this situation, what kind of surprises should I expect if I just use SQLite, say, with UTF-8-encoded data?
Andrey Tarantsov
Thanks a lot. I think your answer boils down to two things: (1) no case-insensitive comparisons/filtering, and (2) no proper sorting. (W.r.t. SQLITE_ENABLE_ICU: I've seen mentions of rejections from App Store because the symbols in a custom-built ICU match some of prohibited symbols which their automated scanning tools look for. So building custom SQLite with ICU seems like a recipe for lots of pain in the ass, even though it does not violate the agreement.)
Andrey Tarantsov
...So I guess a conclusion we've come to is: Google Toolkit for Mac has a good reason to replace SQLite string functions with its custom ones, and we'd better use it. Thanks again for being helpful.
Andrey Tarantsov
@Andrey: thanks. The question made me look through the docs more thoroughly. And I found out several interesting things for myself=)
newtover