views:

76

answers:

3

I have a table with an erroneous symbol('�') in a number of rows in one column.

The TSQL script below does not work.

UPDATE S
SET S.Offering_Details = REPLACE(S.Offering_Details, '�', '...')
FROM tblSacrifices S

The column in question has datatype of nvarchar(230) and allows null entries.

The data came from a csv file converted from Excel an d via a Visual studio windows app. The data originally was '...' but I think perhaps word/excel classed this as one character (rather than 3 separate '.'). When my application read the original string from CSV file it (unintentionally) replaced the '...' with '�' before submitting the data into the database.

PLEASE help

+1  A: 

Try being explicit that the character you are searching for is nvarchar by using a leading N:

UPDATE S
SET S.Offering_Details = REPLACE(S.Offering_Details, N'�', '...')
FROM tblSacrifices S

UPDATE Based on the discussion from Martin's answer:

I've no reason to think that this is functionally any different that what I've posted above, but you could try:

UPDATE S
SET S.Offering_Details = REPLACE(S.Offering_Details, NCHAR(65533), '...')
FROM tblSacrifices S

UPDATE 2

I didn't read Martin's code closely enough before updating my answer. Because it uses the SQL server UNICODE function to display results, the actually error character is masked by 65533.

See here for details of unicode char 65533 - it's the generic unicode value for bad data.

Ed Harper
Thanks Ed but this doesn't work either. Please see my reply to Martin in my original post. Some extra info may assist in answering the question. Thanks again
Chris
+1 for Update2. Any idea if there is an alternative way?
Martin Smith
Thanks both of you. As you probably already know that doesn't work either. Is there a way perhaps to remove the 'bad data' and then replace but then we wouldn't knwo where to put it. I'm stuck!
Chris
Would a convert to binary work?
Chris
Thank's guys for your help. Convert to binary worked.
Chris
A: 

Edit:

Following update in comments try REPLACE(S.Offering_Details, nchar(65533), '...')

Original Answer:

This might help you troubleshoot it

declare @s nvarchar(230)

SELECT @s= ProblemCol
FROM YourTable
WHERE ProblemRowId = X;


set @s = N'日本国'; /*For testing*/


WITH N AS
     (SELECT 1 idx,
             LEFT(@s,1)ch,
             UNICODE(LEFT(@s,1)) C

     UNION ALL

     SELECT idx+1,
            SUBSTRING(@s,idx+1,1),
            UNICODE(SUBSTRING(@s,idx+1,1)) C
     FROM   N
     WHERE  idx<LEN(@s)
     )


  SELECT idx, ch,C
  FROM     N    

Result of Test

idx         ch   C
----------- ---- -----------
1           日    26085
2           本    26412
3           国    22269
Martin Smith
By the way - "Results to text" in management studio seems to handle these chars better than results to grid - at least with my setup.
Martin Smith
Thanks Martin, I've put the results in the my original post. They're the same as yours. I'm unsure what this means though.
Chris
@Chris - Yes sorry I meant for you to delete this line `set @s = N'日本国'; /*For testing*/` it was just to give a demo of what it would return. Once you find out the problem character code(s) you could do Replace with `nchar(26085)` or whatever if you just want to get rid of them.
Martin Smith
I see now, the results returned a value of 65533 for the symbol '�'.
Chris
@Chris - If the decimal value for the mystery character is 65533, I'm pretty confident that the script in my answer should work.
Ed Harper
So far as I can see from the Unicode site character lookup http://www.unicode.org/charts/ there is no such character. Does `REPLACE(S.Offering_Details, nchar(65533), '...')` work?
Martin Smith
@Ed - Why do you say that? `select unicode('�')` gives 63 for me not 65533
Martin Smith
@Martin Smith - I get 65533 from `select unicode(N'�')` - if you don't include the N, you get `'?'`, which is 63
Ed Harper
@Ed - Doh! Absolutely Correct. I guess I was probably using the unicode lookup charts wrong then.
Martin Smith
@Martin - I've updated my answer with a bit more detail. Still no fix, though
Ed Harper
+2  A: 

Figured it out. Thanks all for your help.

I had to convert to binary. All unicode characters above 65500 require this as normal REPLACE() doesn't work.

UPDATE S
SET S.Offering_Details = REPLACE(S.Offering_Details, nchar(65533) COLLATE Latin1_General_BIN, '...')
FROM tblSacrifices S
Chris
@Chris - You are replacing one character with three so the result can be longer. Can you not just use the elipsis single character ` N'…'` (Otherwise I think you will need to make the column wider or use `LEFT` to truncate yourself or `SET ANSI_WARNINGS OFF` to ignore the error and allow truncation.
Martin Smith
I see Thanks Martin. I just extended the max allowable characters for the column as I didn't want to lose any of the data.
Chris