views:

900

answers:

4

This is based on a similar question How to Replace Multiple Characters in Access SQL?

I wrote this since sql server 2005 seems to have a limit on replace() function to 19 replacements inside a where clause.

I have the following task: Need to perform a match on a column, and to improve the chances of a match stripping multiple un-needed chars using replace() function

DECLARE @es NVarChar(1) SET @es = ''
DECLARE @p0 NVarChar(1) SET @p0 = '!'
DECLARE @p1 NVarChar(1) SET @p1 = '@'
---etc...

SELECT *
FROM t1,t2 
WHERE  REPLACE(REPLACE(t1.stringkey,@p0, @es), @p1, @es) 
     = REPLACE(REPLACE(t2.stringkey,@p0, @es), @p1, @es)    
---etc

If there are >19 REPLACE() in that where clause, it doesn't work. So the solution I came up with is to create a sql function called trimChars in this example (excuse them starting at @22

CREATE FUNCTION [trimChars] (
   @string varchar(max)
) 

RETURNS varchar(max) 
AS
BEGIN

DECLARE @es NVarChar(1) SET @es = ''
DECLARE @p22 NVarChar(1) SET @p22 = '^'
DECLARE @p23 NVarChar(1) SET @p23 = '&'
DECLARE @p24 NVarChar(1) SET @p24 = '*'
DECLARE @p25 NVarChar(1) SET @p25 = '('
DECLARE @p26 NVarChar(1) SET @p26 = '_'
DECLARE @p27 NVarChar(1) SET @p27 = ')'
DECLARE @p28 NVarChar(1) SET @p28 = '`'
DECLARE @p29 NVarChar(1) SET @p29 = '~'
DECLARE @p30 NVarChar(1) SET @p30 = '{'

DECLARE @p31 NVarChar(1) SET @p31 = '}'
DECLARE @p32 NVarChar(1) SET @p32 = ' '
DECLARE @p33 NVarChar(1) SET @p33 = '['
DECLARE @p34 NVarChar(1) SET @p34 = '?'
DECLARE @p35 NVarChar(1) SET @p35 = ']'
DECLARE @p36 NVarChar(1) SET @p36 = '\'
DECLARE @p37 NVarChar(1) SET @p37 = '|'
DECLARE @p38 NVarChar(1) SET @p38 = '<'
DECLARE @p39 NVarChar(1) SET @p39 = '>'
DECLARE @p40 NVarChar(1) SET @p40 = '@'
DECLARE @p41 NVarChar(1) SET @p41 = '-'

return   REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
       @string, @p22, @es), @p23, @es), @p24, @es), @p25, @es), @p26, @es), @p27, @es), @p28, @es), @p29, @es), @p30, @es), @p31, @es), @p32, @es), @p33, @es), @p34, @es), @p35, @es), @p36, @es), @p37, @es), @p38, @es), @p39, @es), @p40, @es), @p41, @es)
END

This can then be used in addition to the other replace strings

SELECT *
FROM t1,t2 
WHERE  trimChars(REPLACE(REPLACE(t1.stringkey,@p0, @es), @p1, @es) 
         = REPLACE(REPLACE(t2.stringkey,@p0, @es), @p1, @es))

I created a few more functions to do similar replacing like so trimChars(trimMoreChars(

SELECT *
FROM t1,t2 
WHERE  trimChars(trimMoreChars(REPLACE(REPLACE(t1.stringkey,@p0, @es), @p1, @es) 
         = REPLACE(REPLACE(t2.stringkey,@p0, @es), @p1, @es)))

Can someone give me a better solution to this problem in terms of performance and maybe a cleaner implementation?

+7  A: 

I would seriously consider making a CLR UDF instead and using regular expressions (both the string and the pattern can be passed in as parameters) to do a complete search and replace for a range of characters. It should easily outperform this SQL UDF.

Cade Roux
+1: awesomeness.
Juliet
+2  A: 

One useful trick in SQL is the ability use @var = function(...) to assign a value. If you have multiple records in your record set, your var is assigned multiple times with side-effects:

declare @badStrings table (item varchar(50))

INSERT INTO @badStrings(item)
SELECT '>' UNION ALL
SELECT '<' UNION ALL
SELECT '(' UNION ALL
SELECT ')' UNION ALL
SELECT '!' UNION ALL
SELECT '?' UNION ALL
SELECT '@'

declare @testString varchar(100), @newString varchar(100)

set @teststring = 'Juliet ro><0zs my s0x()rz!!?!one!@!@!@!'
set @newString = @testString

SELECT @newString = Replace(@newString, item, '') FROM @badStrings

select @newString // returns 'Juliet ro0zs my s0xrzone'
Juliet
This is very cool - how does one incorporate this inside the where clause in my question above ? – thanks
kiev
@kiev: you can't put this in a WHERE clause.
Peter
@kiev: creating a user-defined function is the correct approach. However, you are better off using my approach rather than nesting a bajillion replaces in one another, since my approach supports an indefinite number of replaces. You can make the function more dynamic by passing in a comma-separated list of strings to replace, using a split function (http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=50648) to convert the list into a table, then returning the replaced string.
Juliet
A: 

One option is to use a numbers/tally table to drive an iterative process via a pseudo-set based query.

The general idea of char replacement can be demonstrated with a simple character map table approach:

create table charMap (srcChar char(1), replaceChar char(1))
insert charMap values ('a', 'z')
insert charMap values ('b', 'y')


create table testChar(srcChar char(1))
insert testChar values ('1')
insert testChar values ('a')
insert testChar values ('2')
insert testChar values ('b')

select 
coalesce(charMap.replaceChar, testChar.srcChar) as charData
from testChar left join charMap on testChar.srcChar = charMap.srcChar

Then you can bring in the tally table approach to do the lookup on each character position in the string.

create table tally (i int)
declare @i int
set @i = 1
while @i <= 256 begin
    insert tally values (@i)
    set @i = @i + 1
end

create table testData (testString char(10))
insert testData values ('123a456')
insert testData values ('123ab456')
insert testData values ('123b456')

select
    i,
    SUBSTRING(testString, i, 1) as srcChar,
    coalesce(charMap.replaceChar, SUBSTRING(testString, i, 1)) as charData
from testData cross join tally
    left join charMap on SUBSTRING(testString, i, 1) = charMap.srcChar
where i <= LEN(testString)
A: 

I don't know why Charles Bretana deleted his answer, so I'm adding it back in as a CW answer, but a persisted computed column is a REALLY good way to handle these cases where you need cleansed or transformed data almost all the time, but need to preserve the original garbage. His suggestion is relevant and appropriate REGARDLESS of how you decide to cleanse your data.

Specifically, in my current project, I have a persisted computed column which trims all the leading zeros (luckily this is realtively easily handled in straight T-SQL) from some particular numeric identifiers stored inconsistently with leading zeros. This is stored in persisted computed columns in the tables which need it and indexed because that conformed identifier is often used in joins.

Cade Roux