views:

509

answers:

2

Hi,

I have a PostgreSQL\PostGIS spatial database which contains Hebrew text columns. The system runs on Ubuntu, and everything works flawlessly with UTF-8.

I am trying to dump some tables into shapefile for a Windows program which can only read Windows-1255 strings. Unfortunately, pgsql2shp has no encoding option, although shp2pgsql has, so the Widnows program reads UTF-8 parsed as Windows-1255 giving Gibberish.

I have been trying to create an Windows-1255 view to the table columns, but found no way of doing it without corrupting the database.

Any ideas how to convert the tables?

Thanks,

Adam

UPDATE:

I Thought this one was solved (see my own answer), by I still get random errors like:

ERROR:  character 0x9f of encoding "WIN1255" has no equivalent in "UTF8"

What I want is some kind of omit functionality: like iconv's -c flag, which simply does not copy source characters which have no equivalent int target encoding.

+1  A: 

If you really mean ASCII, you can't possibly rescue Hebrew characters. ASCII is only the 7-bit character set up to \x7F.

So what kind of strings does this Windows program read? If it's ASCII, or Latin-1, you'll never get Hebrew. More likely it's “the current system code page”, also (misleadingly but commonly) known in Windows as ‘ANSI’.

If that's the case you will have to set the system code page on every machine that runs the Windows program to Hebrew (code page 1255). I believe shp files have no character encoding information at all, so the shapefiles will only ever work correctly on machines with this code page set (the default only in the Israel locale). (Apparently .dbf exports can have an accompanying .cpg file to specify the encoding, but I've no idea if the program you're using supports that.)

Then you'd have to export the data as code page 1255, or the nearest you're going to get in Postgres, ISO-8859-8. Since the export script doesn't seem to have any option to do anything but take direct bytes from the database, you'd have to create a database in the ISO-8859-8 encoding and transfer all the data from the UTF-8 database to the 8859-8 one, either directly through queries or, perhaps easier, using pgdumpall and loading the SQL into Notepad then re-saving it as Hebrew instead of UTF-8 (adjusting any encoding settings listed in SQL DDL as you go).

I wonder if the makers of the Windows program could be persuaded to support UTF-8? It's a bit sad to be stuck with code-page specific software in this century.

bobince
+1 Thanks, corrected to Windows-1255. I thought that converting the entire DB to ISO-8859-8, but it seems quite insane as all I need is one column converted.
Adam Matan
A: 

From within the bash script:

select ENCODING in UTF8 WIN1252 WIN1255 ISO-8859-8;
do
        if [[ -n $ENCODING ]]; then
                export PGCLIENTENCODING=$ENCODING;
                break
        else
                echo 'Invalid encoding.'
        fi
done

The export PGCLIENTENCODING=$ENCODING; statement does the trick.

Adam Matan
As commented above, this is my old answer, and it ceased to work.
Adam Matan