views:

293

answers:

2

I need to check If a column value (string) in SQL server table starts with a small letter and can only contain '_', '-', numbers and alphabets. I know I can use a SQL server CLR function for that. However, I am trying to implement that validation using a scalar UDF and could make very little here...I can use 'NOT LIKE', but I am not sure how to make sure I validate the string irrespective of the order of characters or in other words write a pattern in SQL for this. Am I better off using a SQL CLR function? Any help will be appreciated..

Thanks in advance

Thank you everyone for their comments. This morning, I chose to go CLR function way. For the purpose of what I was trying to achieve, I created one CLR function which does the validation of an input string and have that called from a SQL UDF and It works well.

Just to measure the performance of t-SQL UDF using SQL CLR function vs t- SQL UDF, I created a SQL CLR function which will just check if the input string contains only small letters, it should return true else false and have that called from a UDF (IsLowerCaseCLR). After that I also created a regular t-SQL UDF(IsLowerCaseTSQL) which does the same thing using the 'NOT LIKE'. Then I created a table (Person) with columns Name(varchar) and IsValid(bit) columns and populate that with names to test.

Data :- 1000 records with 'Ashish' as value for Name column 1000 records with 'ashish' as value for Name column

then I ran the following :- UPDATE Person Set IsValid=1 WHERE dbo.IsLowerCaseTSQL (Name) Above updated 1000 records (with Isvalid=1) and took less than a second.

I deleted all the data in the table and repopulated the same with same data. Then updated the same table using Sql CLR UDF (with Isvalid=1) and this took 3 seconds!

If update happens for 5000 records, regular UDF takes 0 seconds compared to CLR UDF which takes 16 seconds!

I am very less knowledgeable on t-SQL regular expression or I could have tested my actual more complex validation criteria. But I just wanted to know, even I could have written that, would that have been faster than the SQL CLR function considering the example above. Are we using SQL CLR because we can implement we can implement lot richer logic which would have been difficult otherwise If we write in regular SQL.

Sorry for this long post. I just want to know from the experts. Please feel free to ask if you could not understand anything here.

Thank you again for your time.

+2  A: 

CLR is faster than UDF - for this situation I would be using CLR to allow me to run regular expressions for comparisons. But PATINDEX supports limited regex syntax, so you could use:

WHERE PATINDEX('%[regex]%', t.column) > 0

...to return rows that satisfy the expression, because PATINDEX returns a number based on the first position in the string it is testing. If the value is zero, the regex isn't in the string.

OMG Ponies
Thank you for the quick response. Actually I am stuck with writing that Regex in PATINDEX. I see a post here :- From :- http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=27205. "In a PATINDEX, or LIKE, you can use "%" for 0 or more characters, "_" for exactly one any-character, and [0-9] and [^0-9] as you would for a RegEx. But you cannot use "[0-9]*" or "[0-9]+", the use of "[0-9]" matches exactly one character. So you can use "[0-9][0-9][0-9][0-9][0-9]" to find the location of a 5 digit number, but you will struggle to NOT get mismatches on EARLIER 5 digit numbers within the string."
ydobonmai
+4  A: 
WHERE
    ASCII(LEFT(column, 1)) BETWEEN ASCII('a') AND ASCII('z')
    AND
    column COLLATE LATIN1_GENERAL_BIN NOT LIKE '%[^-_a-zA-Z0-9]%'

You need COLLATE to ignore accents (ä à ö etc) by default

gbn
@gbn, thank you for your time and the answer. I have a question on the expression you wrote above. Does the expression indicate that the first character would not be a '-' followed by a '_' followed by a-z and so on? or that takes care of any order of the characters.
ydobonmai
@Ashish Gupta: it's evaluated as - then _ then a-z then A-Z then 0-9. Finally ^ makes it negative.
gbn
@gbn, Thanks again. Now, the thing is characters in my column values can appear in any order. So, I am not sure how I can make use of this expression. That said, thank you for "ASCII(LEFT(column, 1)) BETWEEN ASCII('a') AND ASCII('z')".
ydobonmai
@Ashish Gupta: This does exactly what was asked for, the second part simply makes sure that the whole string only contains '_', '-', numbers and letters.
Qtax
Though this is not the answer which solved my problem. But close and appreciate the effort. Choosing this as answer.
ydobonmai