views:

139

answers:

2

Hi all,

Recently we had an encoding problem in our system :

If we had the string "æ" in our db ,it became "æ" on our web pages.

Now this problem is solved, but the problem is that now we have a lot of "æ" in our database : users didn't see and validate pre-filled form with these characters.

I found that If you read in utf 8 C3A6 you'll get "æ", if you read it in ascii you'll get "æ".

It's strange because if I execute

"select convert(varbinary(40),N'æ'),convert(varbinary(40),'æ')"

I don't have the same result...

Do you have any idea on how I can fix my database (ie change all "æ" to "æ") ?

thx

A: 

As far as I know, the only means to fix is to use Replace:

Update Table
Set Column = Replace(Column, N'æ', N'æ')

In this case, I'm assuming that the column is now Unicode (i.e. nvarchar or nchar).

Thomas
I konw how to do a replace, here I want to deal with every weird character : "ó" became "ó". I 'd like to do something that'll handle every case (as I said in my post there is a logical connection between the expected character and the bad one, so there must be a way to go back).
remi bourgarel
@remi bourgarel - That's my point. There is no silver bullet solution other than manually correcting the data through a series of calls to Replace. You will not get a 1:1 matching because in some cases, Unicode might have encoded your text as two characters instead of one character.
Thomas
A: 

if you read it in ascii you'll get "æ".

ASCII only assigns characters to the bytes 00-7F. There are, however, several "extended ASCII" encodings in which C3 A6 represents "æ", including the popular Western European encodings ISO-8859-1 and windows-1252, and Turkish ISO-8859-9 and windows-1254.

To fix your encoding problem, simply:

  1. Encode the string to a byte array using code page 1252 (or 1254 for Turkish). This should produce the UTF-8 bytes.
  2. Decode the byte array to a string using UTF-8.
dan04
Do you have any idea if i can do it with sql ?
remi bourgarel