views:

75

answers:

1

Hi All,

I want to use BCP to load into a SQL Server 2005 table with an nvarchar field using a loader control file. As I understand it, SQL Server 2005 only supports UTF-16 (and I believe it is UTF-16 LE). The file is being output by a Java program. The way I have it currently set up is as follows:

  1. An XML format BCP loader file (created using the following command: bcp test_table format nul -c -x -T -f test_table.xml -S server)

  2. A Java program using the following code to write the output:

    File f = new File("from_java.txt");
    String encoding = "x-UTF-16LE-BOM";
    OutputStream os = new FileOutputStream(f);
    OutputStreamWriter outputStreamWriter = new OutputStreamWriter(os, encoding);
    String theString = "áááááLittle Endian, BOM\r\n";
    outputStreamWriter.append(theString);
    outputStreamWriter.flush();
    outputStreamWriter.close();
    
  3. Then using the following bcp command:
    bcp test_table in from_java.txt -T -f test_table.xml -S server -error error.txt

What I get in the table is ÿþá. and not áááááLittle Endian, BOM

I've tried a few different permutations of changing parameters:

  • changing the way I generate the loader control file (using -n for native data instead of -c for character data...I think this may have something to do with it, but I didn't see any improvement in my inserted data)
  • tried several different forms of the UTF-16 encoding, including big endian and little endian with no BOM, to no avail
  • tried to output the BOM manually in the file as I read somewhere that Microsoft really like to make use of BOM information
  • looked into trying to output the file as UCS-2 (instead of UTF-16) as that is (apparently) what BCP is actually reading the file in as
  • tried -w on the bcp import, this does work, but not in conjunction with a loader format file (is there a way to incorporate whatever magic tells BCP that the file is encoded in UTF-16 into the format file?)
  • I can get it to work if I output the file in windows-1252 and specify that codepage as a -c 1252 option to bcp when I load the file (but I don't want to do this as I will be losing information as UTF-16 is a superset of what can be represented compared to 1252)

Has anyone managed to get bcp to load into an nvarchar field using UTF-16 data in conjunction with a loader format configuration file?

Thanks in advance,

-James

A: 

I have been literally underwhelmed with responses, but I have cracked it.

The loader file needs to be generated with a -w flag, so the command to generate the file is:

bcp <table> format nul -w -x T -f loader-control-w-format.xml -S <server> -t "||"

This leads to a loader control file that looks a little different, you get entries such as:

<FIELD ID="1" xsi:type="NCharTerm" TERMINATOR="|\0|\0" MAX_LENGTH="1000" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>

Note the delimiter is listed as |\0|\0, the zeroes correspond to the extra byte in the file as UTF-16 (or just "unicode" as Microsoft (wrongly) call it) is a double byte character encoding.

Some notes for the sanity of anyone else dealing with BCP in this way:

  • When SQLServer talks of "native" they mean native characters, i.e. accented characters
  • When SQLServer talks of Unicode, what they actually mean is the UTF16 (Little Endian) manner of encoding the Unicode characterset. This is what the -w pertains to
  • When writing a file for loading into BCP using UTF-16, the file has to be in UTF-16 Little Endian format, and cannot contain a UTF BOM (as BCP will interpret this a byte that should be loaded and your first record will contain the BOM, urgh!)

The Java code for writing out a file in UTF-16 that can be loaded in this way is as follows:

    final File f = new File("C:\\temp\\bcp_prob\\from_java-UTF-16.txt");
    //LE with no BOM is important here:
    final String encoding = "UTF-16LE";
    final OutputStream os = new FileOutputStream(f);
    final OutputStreamWriter outputStreamWriter = new OutputStreamWriter(os, encoding);
    final String theString = "UTF-16-LE, intermetálico básicos intermetálico película magnética dinámicos||another_col\r\n";        
    outputStreamWriter.append(theString);
    outputStreamWriter.flush();
    outputStreamWriter.close();
James B