ansaurus

Question

Character with encoding UTF8 has no equivalent in WIN1252

Answer 1

+2 A:

Don't eridicate the characters, they're real and used for good reasons. Instead, eridicate Win1252.

MSalters 2009-10-14 09:53:09

I wish I could, it's a requirement :)

Monis Iqbal 2009-10-14 10:00:49

Answer 2

+1 A:

That looks like the byte sequence 0xBD, 0xBF, 0xEF as a little-endian integer. This is the UTF8-encoded form of the Unicode byte-order-mark (BOM) character 0xFEFF.

I'm not sure what Postgre's normal behaviour is, but the BOM is normally used only for encoding detection at the beginning of an input stream, and is usually not returned as part of the result.

In any case, your exception is due to this code point not having a mapping in the Win1252 code page. This will occur with most other non-Latin characters too, such as those used in Asian scripts.

Can you change the database encoding to be UTF8 instead of 1252? This will allow your columns to contain almost any character.

devstuff 2009-10-14 10:05:27

No, we cannot change the destination database encoding to UTF8.

Monis Iqbal 2009-10-14 10:17:20

Answer 3

+1 A:

What do you do when you get this message? Do you import a file to Postgres? As devstuff said it is a BOM character. This is a character Windows writes as first to a text file, when it is saved in UTF8 encoding - it is invisible, 0-width character, so you'll not see it when opening it in a text editor.

Try to open this file in for example Notepad, save-as it in ANSI encoding and add (or replace similar) set client_encoding to 'WIN1252' line in your file.

Tometzky 2009-10-14 14:00:53

We are fetching some data from a source DB (UTF-8 encoded Teradata DB) and writing it in a destination DB (WIN1252 encoded Postgres DB).I think there is nothing we can do to convert these characters to WIN1252 as they aren't supported, that is why we wanted to ignore these while fetching from the source DB or saving in the destination DB.

Monis Iqbal 2009-10-15 04:11:51

You can then import this data first to bytea and then use convert_from(string bytea, type text). You can define your own function doing a conversion using "create conversion". This function can ignore characters that are not in WIN1252.Easier would be to use a program filter for your data "iconv -f utf-8 -t windows-1252//translit" would do.

Tometzky 2009-10-15 12:53:06

Easier: import data to temporary UTF-8 database in PostgreSQL, export to text file using pg_dump, convert file to WIN1252 using "iconv -f utf-8 -t windows-1252//translit", convert a file back to UTF-8 using "iconv -t windows-1252 -t utf-8" and then import this file to new WIN1252 database.

Tometzky 2009-10-15 13:53:57

ansaurus

tags:

views:

answers:

Character with encoding UTF8 has no equivalent in WIN1252

related questions