Turn on query logging (log_statement = all) in postgresql.conf, and check the queries.
My bet is that it's the problem of driver (JDBC).
Turn on query logging (log_statement = all) in postgresql.conf, and check the queries.
My bet is that it's the problem of driver (JDBC).
If you are trying to index natural language documents with Postgres (for all I can see you are trying to build an inverted index on the words of the documents), I would recommend you to take a look at Full text search in Postgres instead.
If that is not an option, then check your encoding settings:
I suggest to set them all to UTF-8.
If that still did not help, then I suspect some kind of escaping/encoding issue between the source of the data (your Java source code file) and the destination of the data (the database).
My further investigation of the problem revealed that the problem is related to pure Postgres SQL, I developed pure plpgsql version which is one-to-one port of the code above. Restated question for pure plpgsql is here: http://stackoverflow.com/questions/2089772/why-this-code-fails-in-postgresql-and-how-to-fix-it-work-around-is-it-postgres.
So - it is not Java/JDBC related problem.
Furthermore, I've managed to simplify test code - now it uses one table. Simplified problem was posted on pgsql-bugs mailing list: http://archives.postgresql.org/pgsql-bugs/2010-01/msg00182.php. It is confirmed to occur on other machines (not only mine).
Here is workaround: change database collation from polish to standard 'C'. With 'C' collation there is no error. But without polish collation polish words are sorted incorrectly (with respect to polish national characters), so problem should be fixed in Postgres itself.