views:

175

answers:

3

My ultimate goal is to write a sql script that selects data from a particular table where a nvarchar(max) column contains the character NCHAR(2028).

But the obvious:

select * 
from tablename
where columnname like '%' + NCHAR(2028) + '%'

returns all rows.

+1  A: 

I think you're hitting limitations on characters that are outside of your collation. I had some weird behavior. Notice the result of the two SELECTs here:

CREATE TABLE dbo.foo
(
    id INT IDENTITY(1,1),
    bar NVARCHAR(128)
);

INSERT dbo.foo(bar) SELECT N'foobar'
UNION 
SELECT N'foo' + NCHAR(2028) + N'bar'

SELECT *
    FROM dbo.foo
    WHERE bar LIKE N'%' + NCHAR(2028) + '%';

TRUNCATE TABLE dbo.foo;

INSERT dbo.foo(bar)
SELECT N'foo' + NCHAR(2028) + N'bar'

SELECT *
    FROM dbo.foo
    WHERE bar LIKE N'%' + NCHAR(2028) + '%';

DROP TABLE dbo.foo;

Notice that whether we've inserted one row or two, we always get the first row back, even though the query is the same and the data has changed.

Unfortunately pasting the actual value of NCHAR(2028) into SSMS doesn't work because it is not in the set of supported characters (I get a glyph like a question mark box in Super Mario Brothers). Otherwise I would just suggest:

WHERE columnname LIKE N'%߬%';

If you can do that from your code (and not worry about SSMS), it may be a workable alternative.

Aaron Bertrand
PS I realize that my last line of code shows a nina accent or something, which may not be what you expect. Just imagine that you pasted an actual character representation of NCHAR(2028) into your code and see if that helps resolve it from the app side.
Aaron Bertrand
This is why everyone should learn and only use Esperanto and we would eliminate all this kind of stuff
SQLMenace
I'm in the middle of digesting this comment. I noticed that the first UNION wasn't a UNION ALL and so one row is inserted. (Somehow the UNION determined both strings to be the same)
Michael J Swart
Also note that SQL Server (in the default SQL collation at least) treats most values > NCHAR(384) as unknown, translating them to a question mark. Try this: SELECT NCHAR(ASCII(NCHAR(<many numbers > 384 (with a few exceptions), and any number greater than 1642>)));
Aaron Bertrand
You're right Michael... it seems SQL Server is ignoring the character. If you run the query with UNION ALL you will get two rows in the first query, my bad. However while the strings are considered identical, the result shows that there *is* a character in there. It just looks different depending on whether you are using results to text or results to grid.
Aaron Bertrand
charindex works
SQLMenace
But SELECT NCHAR(UNICODE(NCHAR(<numbers>))); seems saner.
Michael J Swart
UNICODE(NCHAR(2028)) still comes up with the same ? in a box character.
Aaron Bertrand
Yes, I see now. I'm beginning to hate that box.
Michael J Swart
@SQLMenace, how does charindex "work"? It is more restrictive, yes, but I tried it and got no rows instead of all rows - when I did intentionally put an NCHAR(2028) value into one of the rows.
Aaron Bertrand
+2  A: 

This works

CREATE TABLE #temp (columnname NVARCHAR(128))

INSERT #temp VALUES ('a')
INSERT #temp VALUES ( NCHAR(2028))
INSERT #temp VALUES ('b')
INSERT #temp VALUES ('c' +  NCHAR(2028) + 'c')
 INSERT #temp VALUES ('a' + NCHAR(2028) + 'b') 





 SELECT  *
FROM #temp
WHERE ColumnName COLLATE Latin1_General_Bin Like N'%' + NCHAR(2028) + '%'

drop table  #temp
SQLMenace
Check the argument order, Shouldn't your last query look like:CHARINDEX(NCHAR(2028), columnname)
Michael J Swart
fixed...I think the '%' throws it off...maybe the RegEx can't handle these characters...
SQLMenace
This should work
SQLMenace
Unfortunately this doesn't help me with my situation. When rows in #temp contain more than a single character, i.e.: INSERT #temp VALUES ('a' + NCHAR(2028) + 'b')
Michael J Swart
Michael, you just need the trailing %: LIKE N'%' + NCHAR(2028) + '%'; ... in fact you don't need the N prefix at all, I inspected the plan and there is no CONVERT_IMPLICIT to worry about.
Aaron Bertrand
Whew!!! Thanks guys
Michael J Swart
Denis, why'd you change it? The COLLATE version worked, without requiring the cumbersome PATINDEX or the OR - just needed a trailing +'%'. What I don't get is why your PATINDEX works but this doesn't: WHERE patindex('%' + NCHAR(2028) +'%', columnname) > 0 - this returns all rows.
Aaron Bertrand
Spoke too soon. THis works for the values shown, but not for a value like N'regular old string longer than one character'
Michael J Swart
Michael, the COLLATE version of the query that Denis posted earlier continues to work in that case, returning a long string with NCHAR(2028) and NOT returning a long string without it: WHERE columnname COLLATE Latin1_General_Bin LIKE '%' + NCHAR(2028) + '%';
Aaron Bertrand
works when adding this INSERT #temp VALUES ('aaaa' + NCHAR(2028) + 'aaab')
SQLMenace
I actually missed the collate version that denis posted earlier. I found it in the history. It's a version of the answer in which denis thanks G Mastros for his answer. I see that G Mastros has answered...I want to thank everyone for their attention to this. It is much appreciated.
Michael J Swart
Aaron you are right
SQLMenace
+3  A: 

Use a binary collation for your like comparison.

select * 
from tablename
where columnname COLLATE Latin1_General_Bin like '%' + NCHAR(2028) + '%'
G Mastros
Thanks G Mastros.
Michael J Swart
That is right, this works
SQLMenace