views:

261

answers:

2

I have data in an nvarchar field with data in ascii format: "Zard Frères Guesta"

How do I convert it to a readable(unicode) format in t-sql?

A: 

I think what you are saying is that you have what should be the two bytes of a single unicode character in two consecutive unicode characters (the high bytes are probably all 0 bytes). This can happen for all sorts of reasons if you aren't handling the code pages correctly during insertions.

You'll need to get these back to single bytes and then re-encode the data.

Cade Roux
The values are stored in a text file in that format(e.g. Zard Frères Guesta) and I am using a bulk insert with openrowset to import the data.
Guazz
select somecolofdata from OPENROWSET(BULK ''' + @filePath + '\' + @fileName +''', FORMATFILE=''' + @formatFileName + ''', ERRORFILE =''' + @errorFileName + ''' , FIRSTROW = 2, CODEPAGE = 1252, MAXERRORS = 1000 ) as importeddata
Guazz
This is the format file entry: <FIELD ID="1" xsi:type="CharTerm" TERMINATOR=";" MAX_LENGTH="500" COLLATION="Latin1_General_CI_AS"/>
Guazz
<COLUMN SOURCE="1" NAME="somcolofdata" xsi:type="SQLNVARCHAR"/>
Guazz
I am storing it as nvarchar(500)
Guazz
@Guazz Is this (è) meant to be two bytes for a single character (code-point)? Because then that's not code page 1252 (which is a single byte encoding).
Cade Roux
@Guazz If this is UTF-8 (varying widths, with ASCII as a subset), then BCP can't handle that code page: http://msdn.microsoft.com/en-us/library/ms162802.aspx
Cade Roux
How can I tell if it's UTF-8 or not?
Guazz
@Guazz Paste into notepad, wrap with html/body, open in browser, set character encoding to UTF-8, I get: Zard Frères Guesta - looks like UTF-8 to me. So you need to convert the file to UTF-16 (every character 2-bytes) before using BCP and then use the equivalent of the -W option in your BULK INSERT.
Cade Roux
@Guazz Apparently you can use code page 65001 in SQL Server 2005 bcp, but not in 2008: https://connect.microsoft.com/SQLServer/feedback/details/370419/bulk-insert-and-bcp-does-not-recognize-codepage-65001?wa=wsignin1.0
Cade Roux
A: 

I think your problem is that you're using CODEPAGE = 1252 (a single-byte characterset) when you're loading the data in OPENROWSET.

Try using 1202 (UTF-16), or possibly 1208 (UTF-8)

Cheers

amir75
I believe code page 65001 should do the trick. https://connect.microsoft.com/SQLServer/feedback/details/370419/bulk-insert-and-bcp-does-not-recognize-codepage-65001?wa=wsignin1.0
Cade Roux