ansaurus

Question

SQL Server: replace sequence of same characters inside Text Field (TSQL only)

Answer 1

+3 A:

Assuming a table definition of

CREATE TABLE myTable(rowID INT IDENTITY(1,1), dupedchars NVARCHAR(4000))

and data..

 INSERT INTO myTable
      SELECT 'aaabbaaacbaaaccc'
       UNION
      SELECT 'abcdeeeeeffgghhaaabbbjdduuueueu999whwhwwwwwww'

this query meets your criteria

    WITH Numbers(n)
      AS
       (   SELECT 1 AS n
          UNION ALL
             SELECT (n + 1) AS n
              FROM Numbers
             WHERE n < 4000
       )
  SELECT rowid,
       (   SELECT CASE 
           WHEN SUBSTRING(dupedchars,n2.n,1) = SUBSTRING(dupedchars+' ',n2.n+1,1) THEN '' 
           ELSE SUBSTRING(dupedchars,n2.n,1) 
            END AS [text()]
           FROM myTable t2,numbers n2
          WHERE n2.n <= LEN(dupedchars)
            AND t.rowid = t2.rowid
            FOR XML path('')
       ) AS deduped
    FROM myTable  t
  OPTION(MAXRECURSION 4000)

Output

rowid   deduped
   1    abacbac
   2    abcdefghabjdueueu9whwhw

CResults 2010-03-18 00:37:48

CResults: it's Fantastic! )) I thought almost about the same. But different approach. Yours one is more universal!Thanks! And what about Performance issues for table with 100 000+ rows ??? Am I right, that it is THE ONLY one option doing this via Native SQL?

zmische 2010-03-18 06:41:21

For that many rows you're looking at an execution time of around 10 seconds. The alternatives (which I was looking at originally) would be to have a physical table alternative to Numbers with an index. You *may* get some improvement from that but the slow part of the query is the de-duping - any string manipulation of this type will have a speed overhead.

CResults 2010-03-18 09:31:05

Note the 10 seconds is based on string lengths similar to above. As suggested the time is involved in de-duping. Set all your fields to 4000 characters and you're looking at around 1000 results per minute. If you have duplicate values in your fields you will get an optimisation by only supplying the unique values to this query.

CResults 2010-03-18 09:43:11

ansaurus

tags:

views:

answers:

SQL Server: replace sequence of same characters inside Text Field (TSQL only)

related questions