tags:

views:

1685

answers:

4

The client has asked for a number of tables to be extracted into csv's, all done no problem. They've just asked we make sure the files are always in UTF 8 format.

How do I check this is actually the case. Or even better force it to be so, is it something i can set in a procedure before running a query perhaps?

The data is extracted from an Oracle 10g database.

What should I be checking?

Thanks

+2  A: 

You can check the database character set with the following query:

select value from nls_database_parameters 
where parameter='NLS_CHARACTERSET'

If it says AL32UTF8 then your database is in the format what you need and if the export does not impair it then your are done.

You may read about Oracle globalization support here, and here about NLS parameters like the above.

rics
Isn't that the database character set? Oracle may convert if it thinks your client requires a different character set.
WW
A: 

What you have to look for is the eight-bit ascii characters in hte input (if any) are translated into double byte utf-8 characters.

This is highly dependant on your local ASCII code page but typically:- ASCII "£" should be x'A3' in ascii magically becomes x'C2A3' in utf-8.

James Anderson
A: 

How, exactly, are you generating the CSV files? Depending on the exact architecture, there will be different answers.

If you are, for example, using SQL*Plus to extract the data, you would need to set the NLS_LANG on the client machine to something appropriate (i.e. AMERICAN_AMERICA.AL32UTF8) to force the data to be sent to the client machine in UTF-8. If you are using other approaches, NLS_LANG may or may not be important.

Justin Cave
A: 

Ok it wasn't as simple as I first hoped. The query above returns AL32UTF8.

I am using a stored proc compiled on the database to loop through a list of table names held in an array inside the stored procedure.

I use DBMS_SQL package to build the SQL and UTL_FILE.PUT_NCHAR to insert data into a text file.

I believed then my resultant output would be in UTF 8 however opening in Textpad says it's in ANSI and the data is garbled in places :)

Cheers

It might be important that NLS_CHARACTERSET is AL32UTF8 and NLS_NCHAR_CHARACTERSET is AL16UTF16

Robert
The file is just a sequence of bytes. Textpad is showing you how it would translate those bytes into displayable characters on the assumption that it is an ANSI file. I use PSPad which allows you to select which encoding it uses to interpret the bytes. I guess Textpad has something similar.
Gary
Interesting thanks, that does change my understanding somewhat.
Robert