views:

309

answers:

2

I need a solution (t-sql function / procedure) to parse an SQL Server varchar(max) and eliminating all special characters and accents

The output of this string will be transformed to a CSV file using an AWK script that breaks on special characters like '&', '%', '\' and all accent characters that on convert turn into unknown characters (like ç in français) so that's why I need this parser.

Thank you

A: 

If I got you right:

SELECT REPLACE('abc&de','&','_')
Steffen
Sure, but that would mean I have to search all the characters that the script would break on, can you at least give me a list of such characters that I can use SQL's replace function on ? Or some other method to at least get rid of all the accent characters in my string .. like switching collation and eliminating the unknown character ?
Paul
Plus, REPLACE SQL function is not case sensitive and if i try somthing like select `replace('Montréal is nicE', 'é', 'e')` the output is `Montreal is nice` so it counts `E` as an `é`
Paul
Ahh, ok, got you. I guess you only want a-z, A-Z and 0-9 in your result then? You should go with regular expressions, but SQL Server does not nativeley support this. If it necessarily needs to be done with t-sql, maybe this works: http://www.sqlteam.com/article/regular-expressions-in-t-sql Not tested by myself, but looks like it could work.
Steffen
+1  A: 

You can try this:

CREATE TABLE dbo.Bad_ASCII_Characters (ascii_char CHAR(1) NOT NULL)

DECLARE @i INT
SET @i = 1
WHILE @i <= 255
BEGIN
    IF  (@i <> 32) AND
        (@i NOT BETWEEN 48 AND 57) AND
        (@i NOT BETWEEN 65 AND 90) AND
        (@i NOT BETWEEN 97 AND 122)
    BEGIN
        INSERT INTO dbo.Bad_ASCII_Characters (ascii_char) VALUES(CHAR(@i))
    END

    SET @i = @i + 1
END

DECLARE @row_count INT
SET @row_count = 1

WHILE (@row_count > 0)
BEGIN
     UPDATE T
     SET my_column = REPLACE(my_column, ascii_char, '')
     FROM My_Table T
     INNER JOIN dbo.Bad_ASCII_Characters BAC ON
          T.my_column LIKE '%' + BAC.ascii_char + '%'

     SET @row_count = @@ROWCOUNT
END

I haven't tested it, so you might need to tweak it a bit. You can either generate the table on the fly each time, or you can leave it out there and if your requirements change slightly (for example, you find some characters that it will parse correctly) then you can just change the data in the table.

The WHILE loop around the update is in case some columns contain multiple special characters. If your table is very large you might see some performance issues here.

Tom H.