views:

1317

answers:

3

I am getting the following exception:

Caused by: org.postgresql.util.PSQLException: ERROR: character 0xefbfbd of encoding "UTF8" has no equivalent in "WIN1252"

Is there a way to eradicate such characters, either via SQL or programmatically? (SQL solution should be preferred).

I was thinking of connecting to the DB using WIN1252 but it will give the same problem.

+2  A: 

Don't eridicate the characters, they're real and used for good reasons. Instead, eridicate Win1252.

MSalters
I wish I could, it's a requirement :)
Monis Iqbal
+1  A: 

That looks like the byte sequence 0xBD, 0xBF, 0xEF as a little-endian integer. This is the UTF8-encoded form of the Unicode byte-order-mark (BOM) character 0xFEFF.

I'm not sure what Postgre's normal behaviour is, but the BOM is normally used only for encoding detection at the beginning of an input stream, and is usually not returned as part of the result.

In any case, your exception is due to this code point not having a mapping in the Win1252 code page. This will occur with most other non-Latin characters too, such as those used in Asian scripts.

Can you change the database encoding to be UTF8 instead of 1252? This will allow your columns to contain almost any character.

devstuff
No, we cannot change the destination database encoding to UTF8.
Monis Iqbal
+1  A: 

What do you do when you get this message? Do you import a file to Postgres? As devstuff said it is a BOM character. This is a character Windows writes as first to a text file, when it is saved in UTF8 encoding - it is invisible, 0-width character, so you'll not see it when opening it in a text editor.

Try to open this file in for example Notepad, save-as it in ANSI encoding and add (or replace similar) set client_encoding to 'WIN1252' line in your file.

Tometzky
We are fetching some data from a source DB (UTF-8 encoded Teradata DB) and writing it in a destination DB (WIN1252 encoded Postgres DB).I think there is nothing we can do to convert these characters to WIN1252 as they aren't supported, that is why we wanted to ignore these while fetching from the source DB or saving in the destination DB.
Monis Iqbal
You can then import this data first to bytea and then use convert_from(string bytea, type text). You can define your own function doing a conversion using "create conversion". This function can ignore characters that are not in WIN1252.Easier would be to use a program filter for your data "iconv -f utf-8 -t windows-1252//translit" would do.
Tometzky
Easier: import data to temporary UTF-8 database in PostgreSQL, export to text file using pg_dump, convert file to WIN1252 using "iconv -f utf-8 -t windows-1252//translit", convert a file back to UTF-8 using "iconv -t windows-1252 -t utf-8" and then import this file to new WIN1252 database.
Tometzky