views:

314

answers:

2

I created a database in PostgreSQL with "encoding = 'UTF8'", and loaded some UTF8 data in it. Selecting works fine, but when I try to do a "WHERE UPPER(name) = 'FOO'" in a query, I get an error

ERROR:  invalid multibyte character for locale

My research seems to indicate that this is because the PostgreSQL installation was "initdb"-ed with LANG=en_US rather than LANG=en_US.UTF8. Doing a "SHOW LC_COLLATE" shows "en_US". I don't want to have to dump and recreate all my databases, because several of them are PostGIS and it's a royal pain to recreate those. Is there a work-around, like a way to do the equivalent of "UPPER" that works for UTF8?

Update I ended up doing the dump, reinitdb, and restore of the database, and it was less painful than I thought it would be, except for a bit of a problem figuring out where the data was supposed to go because the postgres user doesn't set the PGDATA environment variable, and neither does any config file or shell script that I could find.

+1  A: 

I don't think the workaround you want is feasible, but dump and restore of your PostGIS-enabled databases should work fine. I regularly dump and restore databases with the PostGIS functions and data with geom objects.

What kind of problems do you have?

cope360
The problem I have is that when I do a pg_dumpall, and then restore, it starts complaining about duplicate function definitions and the like.
Paul Tomblin
+1  A: 

You diagnostic is right, it is a common problem with Unicode in PostgreSQL. The installation procedure tried to be smart and initdbed with the locale of the shell running it :-(

I suggest that, if you cannot dump and restore your database, you have a problem more serious and more urgent that uppercasing the data. IMHO, you should address this problem first, before you have to actually restore your data after a new version of PostgreSQL was issued (or after a hard disk failure).

bortzmeyer