views:

36

answers:

3

I wonder if there is an easy and efficient way in SQL Server 2005 to eliminate replicated characters in a string. Like converting

'ABBBCDEEFFFFG' to 'ABCDEFG'

It really sucks that SQL Server has such a poor string library and no ready-to-use regexp feature...

+2  A: 

You can use the CLR functionality built into SQL Server 2005/2008 to get this done by .NET code.

MSDN magazine wrote about it in their February 2007 issue.

If this is not an acceptable solution, here is a UDF that will do the same, mind you this is about two orders of magnitude slower than the CLR solution.

Oded
A: 

YMMV. This appears to work for your string above. But not ABBBCDEEBBBBG

DECLARE @Numbers TABLE (Num smallint NOT NULL PRIMARY KEY)

INSERT @Numbers (Num)
SELECT TOP 8000
    ROW_NUMBER() OVER (ORDER BY c1.NAME)
FROM
    sys.columns c1

DECLARE @STuff TABLE (Seq varchar(100) NOT NULL PRIMARY KEY)

INSERT @STuff (Seq) VALUES ('ABBBCDEEFFFFG') --works

SELECT
    Single
FROM
    (
    SELECT DISTINCT
        CAST(Single AS varchar(100))
    FROM
       @Numbers N
       CROSS APPLY 
       (SELECT Seq, SUBSTRING(Seq, Num, 1) AS Single FROM @Stuff) S
    WHERE
       Num <= LEN(Seq)
    FOR XML PATH ('')
    ) foo(Single)
gbn
A: 

I know about the CLR solution, but as I said, I am neither responsible nor authorized to implement it in the DB of question.

For this particular problem, I decided to write a very simple and kinda silly loop. I am afraid it won't be fast enough for millions of records, but anyways... I wish I could do this stuff in the application layer but I am bound to T-SQL here..

  DECLARE @i int ;  -- counter
  DECLARE @input varchar(200) ;
  SET @input = 'AAABCDEEFFBBBXYZSSSWWWNT'

  IF LEN(@input) > 1
  BEGIN
    DECLARE @unduplicated varchar(200) ;
    SET @unduplicated = SUBSTRING(@input,1,1) ;

    SET @i = 2 ;
    WHILE @i <= LEN(@input)
    BEGIN
      -- If current char is different from the last char, concatenate, else not
      IF SUBSTRING(@unduplicated, LEN(@unduplicated), 1) <> SUBSTRING(@input, @i, 1)
        SET @unduplicated = @unduplicated + SUBSTRING(@input, @i, 1) ;
      SET @i = @i + 1;
    END
  END  
  SELECT @unduplicated AS unduplicated;

Result:

unduplicated 
ABCDEFBXYZSWNT
ercan