UTF8 issues on Linux

views:

answers:

UTF8 issues on Linux

Hi,

I have some code that fetches some data from the database, database codepage is UTF8. When I run the code on a linux box, some characters come out as question marks (?) but when I run the same code on a windows server, all characters appear correctly.

When I do: $> $LANG Following is returned en_SG.UTF-8

en_SG is something that doesn't look correct, it should be en_US but the latter part of the returned string is UTF-8 which is good. Is there anything else that I can look into to fix the character corruption problem?

Generally, ? appears when the font you have does not have a representation for that Unicode codepoint. What are you viewing in and what font are you using?

Yann Ramin 2010-06-10 18:43:03

special characters are trademark, registered etc are appearing as question marks. My code's output goes into an xml file. Now when I view the contents of the xml file using shell, i see question marks. Also if I bring the file from linux to windows machine, I still see question marks.

2010-06-10 18:58:05

Question marks might also appear when charset conversion happens and the target charset doesn't have the code for the source character. Apparently on Linux the conversion has happened, on Windows didn't not. Note that some DB client libraries do not use $LANG. e.g. Oracle client uses $NLS_LANG variable instead. Worth to recheck the DB client documentation.

Dummy00001 2010-06-10 20:06:02

hmm, I do not have $NLS_LANG set on my linux machine but it is set on my windows machine. Maybe this is the reason why I get garbage characters in linux? I can try setting this env variable

2010-06-10 20:29:25

Can you please provide information about the environment? What programming language are you working with, what library or methods are you using to connect to and pull information from the database, and what library or methods are you using to output the data to file?

I am assuming that both instances of running your code (on Windows and Linux) are accessing the data from the same physical database.

The culprit I would be looking for is that one of your I/O's is converting the Unicode data to some other (probably ASCII or Latin1) codepage.

It could be that the database itself is converting because the database methods are defaulting to a different encoding. It could be that the database methods are converting the incoming information because the language itself is defaulting to a different codepage. It could be that the output methods are converting.

Sam Rodgers 2010-06-10 19:49:11

I am running RedHat Enterprise Linux 3Programming lanuguage is JavaUsing JDOM to write my XMLsYes the database instance is the same (whether I run the code from windows or linux)

2010-06-10 20:11:35

No Java experts swooping in to save the day? : (

Sam Rodgers 2010-06-12 01:12:09

ansaurus

tags:

views:

answers:

UTF8 issues on Linux

related questions