ansaurus

Question

Write a T-SQL query that filters records containing NCHAR(2028)

Answer 1

+1 A:

I think you're hitting limitations on characters that are outside of your collation. I had some weird behavior. Notice the result of the two SELECTs here:

CREATE TABLE dbo.foo
(
    id INT IDENTITY(1,1),
    bar NVARCHAR(128)
);

INSERT dbo.foo(bar) SELECT N'foobar'
UNION 
SELECT N'foo' + NCHAR(2028) + N'bar'

SELECT *
    FROM dbo.foo
    WHERE bar LIKE N'%' + NCHAR(2028) + '%';

TRUNCATE TABLE dbo.foo;

INSERT dbo.foo(bar)
SELECT N'foo' + NCHAR(2028) + N'bar'

SELECT *
    FROM dbo.foo
    WHERE bar LIKE N'%' + NCHAR(2028) + '%';

DROP TABLE dbo.foo;

Notice that whether we've inserted one row or two, we always get the first row back, even though the query is the same and the data has changed.

Unfortunately pasting the actual value of NCHAR(2028) into SSMS doesn't work because it is not in the set of supported characters (I get a glyph like a question mark box in Super Mario Brothers). Otherwise I would just suggest:

WHERE columnname LIKE N'%߬%';

If you can do that from your code (and not worry about SSMS), it may be a workable alternative.

Aaron Bertrand 2010-02-02 20:01:05

PS I realize that my last line of code shows a nina accent or something, which may not be what you expect. Just imagine that you pasted an actual character representation of NCHAR(2028) into your code and see if that helps resolve it from the app side.

Aaron Bertrand 2010-02-02 20:04:28

This is why everyone should learn and only use Esperanto and we would eliminate all this kind of stuff

SQLMenace 2010-02-02 20:05:47

I'm in the middle of digesting this comment. I noticed that the first UNION wasn't a UNION ALL and so one row is inserted. (Somehow the UNION determined both strings to be the same)

Michael J Swart 2010-02-02 20:10:20

Also note that SQL Server (in the default SQL collation at least) treats most values > NCHAR(384) as unknown, translating them to a question mark. Try this: SELECT NCHAR(ASCII(NCHAR(<many numbers > 384 (with a few exceptions), and any number greater than 1642>)));

Aaron Bertrand 2010-02-02 20:12:09

You're right Michael... it seems SQL Server is ignoring the character. If you run the query with UNION ALL you will get two rows in the first query, my bad. However while the strings are considered identical, the result shows that there *is* a character in there. It just looks different depending on whether you are using results to text or results to grid.

Aaron Bertrand 2010-02-02 20:14:21

charindex works

SQLMenace 2010-02-02 20:14:30

But SELECT NCHAR(UNICODE(NCHAR(<numbers>))); seems saner.

Michael J Swart 2010-02-02 20:15:03

UNICODE(NCHAR(2028)) still comes up with the same ? in a box character.

Aaron Bertrand 2010-02-02 20:29:17

Yes, I see now. I'm beginning to hate that box.

Michael J Swart 2010-02-02 20:30:15

@SQLMenace, how does charindex "work"? It is more restrictive, yes, but I tried it and got no rows instead of all rows - when I did intentionally put an NCHAR(2028) value into one of the rows.

Aaron Bertrand 2010-02-02 20:31:08

Answer 2

+2 A:

This works

CREATE TABLE #temp (columnname NVARCHAR(128))

INSERT #temp VALUES ('a')
INSERT #temp VALUES ( NCHAR(2028))
INSERT #temp VALUES ('b')
INSERT #temp VALUES ('c' +  NCHAR(2028) + 'c')
 INSERT #temp VALUES ('a' + NCHAR(2028) + 'b') 





 SELECT  *
FROM #temp
WHERE ColumnName COLLATE Latin1_General_Bin Like N'%' + NCHAR(2028) + '%'

drop table  #temp

SQLMenace 2010-02-02 20:14:12

Check the argument order, Shouldn't your last query look like:CHARINDEX(NCHAR(2028), columnname)

Michael J Swart 2010-02-02 20:18:17

fixed...I think the '%' throws it off...maybe the RegEx can't handle these characters...

SQLMenace 2010-02-02 20:22:42

This should work

SQLMenace 2010-02-02 20:26:50

Unfortunately this doesn't help me with my situation. When rows in #temp contain more than a single character, i.e.: INSERT #temp VALUES ('a' + NCHAR(2028) + 'b')

Michael J Swart 2010-02-02 20:28:01

Michael, you just need the trailing %: LIKE N'%' + NCHAR(2028) + '%'; ... in fact you don't need the N prefix at all, I inspected the plan and there is no CONVERT_IMPLICIT to worry about.

Aaron Bertrand 2010-02-02 20:34:11

Whew!!! Thanks guys

Michael J Swart 2010-02-02 20:38:30

Denis, why'd you change it? The COLLATE version worked, without requiring the cumbersome PATINDEX or the OR - just needed a trailing +'%'. What I don't get is why your PATINDEX works but this doesn't: WHERE patindex('%' + NCHAR(2028) +'%', columnname) > 0 - this returns all rows.

Aaron Bertrand 2010-02-02 20:43:47

Spoke too soon. THis works for the values shown, but not for a value like N'regular old string longer than one character'

Michael J Swart 2010-02-02 20:44:13

Michael, the COLLATE version of the query that Denis posted earlier continues to work in that case, returning a long string with NCHAR(2028) and NOT returning a long string without it: WHERE columnname COLLATE Latin1_General_Bin LIKE '%' + NCHAR(2028) + '%';

Aaron Bertrand 2010-02-02 20:46:44

works when adding this INSERT #temp VALUES ('aaaa' + NCHAR(2028) + 'aaab')

SQLMenace 2010-02-02 20:47:52

I actually missed the collate version that denis posted earlier. I found it in the history. It's a version of the answer in which denis thanks G Mastros for his answer. I see that G Mastros has answered...I want to thank everyone for their attention to this. It is much appreciated.

Michael J Swart 2010-02-02 20:48:37

Aaron you are right

SQLMenace 2010-02-02 20:49:08

Answer 3

+3 A:

Use a binary collation for your like comparison.

select * 
from tablename
where columnname COLLATE Latin1_General_Bin like '%' + NCHAR(2028) + '%'

G Mastros 2010-02-02 20:46:31

Thanks G Mastros.

Michael J Swart 2010-02-02 20:48:57

That is right, this works

SQLMenace 2010-02-02 20:50:11

ansaurus

tags:

views:

answers:

Write a T-SQL query that filters records containing NCHAR(2028)

related questions