views:

138

answers:

5

I know MS T-SQL does not support regular expression, but I need similar functionality. Here's what I'm trying to do:

I have a varchar table field which stores a breadcrumb, like this:

/ID1:Category1/ID2:Category2/ID3:Category3/

Each Category name is preceded by its Category ID, separated by a colon. I'd like to select and display these breadcrumbs but I want to remove the Category IDs and colons, like this:

/Category1/Category2/Category3/

Everything between the leading slash (/) up to and including the colon (:) should be stripped out.

I don't have the option of extracting the data, manipulating it externally, and re-inserting back into the table; so I'm trying to accomplish this in a SELECT statement.

I also can't resort to using a cursor to loop through each row and clean each field with a nested loop, due to the number of rows returned in the SELECT.

Can this be done?

Thanks all - Jay

A: 

You can use SQL CLR. Here's an MSDN article:

JoshBerke
+1  A: 

For SQL Server 2005+, you can get regex support by:

  1. Enabling CLR (doesn't require instance restart)
  2. Uploading your CLR functionality (in this case, regex replace)

Using native TSQL, you'll need to define REPLACE statements for everything you want to remove:

SELECT REPLACE(
         REPLACE(
           REPLACE(''/ID1:Category1/ID2:Category2/ID3:Category3/'', 'ID1:', ''),
            'ID2:', ''), 
         'ID3:', '')

Regex or otherwise, you need to be sure these patterns don't appear in the actual data.

OMG Ponies
+1  A: 

You might be able to do this using a Split function. The following split function relies on the existence of a Numbers table which literally contains a sequential list of numbers like so:

Create Table dbo.Numbers( Value int not null primary key clustered )
GO
With Nums As
    (
    Select ROW_NUMBER() OVER( Order By o.object_id ) As Num
    From sys.objects as o
        cross join sys.objects as o2
    )
Insert dbo.Numbers( Value )
Select Num
From Nums
Where Num Between 1 And 10000
GO  


Create Function [dbo].[udf_Split] (@DelimitedList nvarchar(max), @Delimiter nvarchar(2) = ',')
Returns @SplitResults TABLE (Position int NOT NULL PRIMARY KEY, Value nvarchar(max))
AS
/*
PURPOSE: to split the @DelimitedList based on the @Delimter
DESIGN NOTES:
    1. In general the contents of the next item is: NextDelimiterPosition - CurrentStartPosition
    2. CurrentStartPosition = 
        CharIndex(@Delimiter, A.list, N.Value)  = Current Delimiter position
        + Len(@Delimiter)                       + The number of delimiter characters 
        + 1                                     + 1 since the text of the item starts after the delimiter
    3. We need to calculate the delimiter length because the LEN function excludes trailing spaces. Thus
        if a delimiter of ", " (a comma followed by a space) is used, the LEN function will return 1.
    4. The DataLength function returns the number of bytes in the string. However, since we're using
        an nvarchar for the delimiter, the number of bytes will double the number of characters.
*/
Begin
    Declare @DelimiterLength int
    Set @DelimiterLength = DataLength(@Delimiter) / 2

    If Left(@DelimitedList, @DelimiterLength) <> @Delimiter
        Set @DelimitedList = @Delimiter + @DelimitedList

    If Right(@DelimitedList, @DelimiterLength) <> @Delimiter
        Set @DelimitedList = @DelimitedList + @Delimiter

    Insert @SplitResults(Position, Value)
    Select CharIndex(@Delimiter, A.list, N.Value) + @DelimiterLength            
        , Substring (
                    A.List
                    , CharIndex(@Delimiter, A.list, N.Value) + @DelimiterLength         
                    , CharIndex(@Delimiter, A.list, N.Value + 1)                            
                        - ( CharIndex(@Delimiter, A.list, N.Value) + @DelimiterLength ) 
                    )
    From dbo.Numbers As N
        Cross Join (Select @DelimitedList As list) As A
    Where N.Value > 0
        And N.Value < LEN(A.list)
        And Substring(A.list, N.Value, @DelimiterLength) = @Delimiter
    Order By N.Value

    Return
End

You then might be able to run a query like so where you strip out the prefixes:

Select Table, Substring(S.Value, CharIndex(':', S.Value) + 1, Len(S.Value))
From Table
    Cross Apply dbo.udf_Split(Table.ListColumn, '/') As S

This would give you values like:

Category1
Category2
Category3

You could then use FOR XML PATH to combine them again:

Select Table.PK
    ,   Stuff(  (
                Select '/' + Substring(S.Value, CharIndex(':', S.Value) + 1, Len(S.Value))
                From Table As Table1
                    Cross Apply dbo.udf_Split(Table.ListColumn, '/') As S1
                Where Table1.PK = Table.PK
                Order By S1.Position
                For Xml Path('')
                ), 1, 1, '') As BreadCrumb
From Table
Thomas
MY suggestion is that you create a new field to store the value you want and use a process like the one able to populate it, then populate all new records on data entry (and any changes to the original field) when it would take less time. To try to do this sort of process every time you need to select is painful.
HLGEM
@HLGEM - Agreed. I would use the above functionality to do a one-time population of a "cleaned" breadcrumb column.
Thomas
I also agree, this would be a bit onerous every time a SELECT is run. However, creating, populating and maintaining an additional "cleaned" column just wasn't an option for me at this time. I needed an alternative approach. Greatly appreciate the contribution, nonetheless, Thomas.
Jay
+1  A: 

I think your best bet is going to be to use a recursive user-defined function (UDF). I've included some code here that you can use to pass in a string to achieve the results you're looking for.

CREATE FUNCTION ufn_StripIDsFromBreadcrumb (@cIndex int, @breadcrumb varchar(max), @theString varchar(max))

RETURNS varchar(max)

AS

BEGIN
DECLARE @nextColon int
DECLARE @nextSlash int

SET @nextColon = CHARINDEX(':', @theString, @cIndex)
SET @nextSlash = CHARINDEX('/', @theString, @nextColon)
SET @breadcrumb = @breadcrumb + SUBSTRING(@theString, @nextColon + 1, @nextSlash - @nextColon)

IF @nextSlash != LEN(@theString)

     BEGIN
     exec @breadcrumb = ufn_StripIDsFromBreadcrumb @cIndex =  @nextSlash, @breadcrumb = @breadcrumb, @theString = @theString
     END
RETURN @breadcrumb
END

You could then execute it with:

DECLARE @myString varchar(max)
EXEC @myString = ufn_StripIDsFromBreadcrumb 1, '/', '/ID1:Category1/ID2:Category2/ID3:Category3/'
PRINT @myString
Maashu
This was the easiest to implement into our current solution. I'd seen references to enabling CLR as Josh and OMG Ponies suggested, but recursion just didn't occur to me. Greatly appreciate all the input, all.
Jay
+1  A: 

This works for SQL Server 2005 and up.

create table strings (
  string varchar(1000)
)

insert into strings values( '/ID1:Category1/ID2:Category2/ID3:Category3/' )  
insert into strings values( '/ID4:Category4/ID5:Category5/ID8:Category6/' )  
insert into strings values( '/ID7:Category7/ID8:Category8/ID9:Category9/' )  
go

with  
replace_with_wildcard ( restrung ) as 
( 
  select replace( string, '', '' ) 
  from strings

  union all 

  select  
    replace( restrung, substring( restrung, patindex( '%ID%', restrung ), 4 ), '' ) 
  from replace_with_wildcard 
  where patindex( '%ID%', restrung ) > 0 
) 

select restrung
from replace_with_wildcard 
where charindex( ':', restrung ) = 0
order by restrung

drop table strings 
Oliver