views:

6701

answers:

7

How could you remove all characters that are not alphabetic from a string?

What about non-alphanumeric?

Does this have to be a custom function or are there also more generalizable solutions?

+1  A: 

I don't think there is any SQL based functions which can be easily used to do this.

But here is an article on writing the function you're looking for: SQL STrim Function

Salgar
A: 

you need to do a custom replace functions

SQLMenace
+1  A: 

I knew that SQL was bad at string manipulation, but I didn't think it would be this difficult. Here's a simple function to strip out all the numbers from a string. There would be better ways to do this, but this is a start.

CREATE FUNCTION dbo.AlphaOnly (
    @String varchar(100)
)
RETURNS varchar(100)
AS BEGIN
  RETURN (
    REPLACE(
      REPLACE(
        REPLACE(
          REPLACE(
            REPLACE(
              REPLACE(
                REPLACE(
                  REPLACE(
                    REPLACE(
                      REPLACE(
                        @String,
                      '9', ''),
                    '8', ''),
                  '7', ''),
                '6', ''),
              '5', ''),
            '4', ''),
          '3', ''),
        '2', ''),
      '1', ''),
    '0', '')
  )
END
GO

-- ==================
DECLARE @t TABLE (
    ColID       int,
    ColString   varchar(50)
)

INSERT INTO @t VALUES (1, 'abc1234567890')

SELECT ColID, ColString, dbo.AlphaOnly(ColString)
FROM @t

Output

ColID ColString
----- ------------- ---
    1 abc1234567890 abc

Round 2 - Data-Driven Blacklist

-- ============================================
-- Create a table of blacklist characters
-- ============================================
IF EXISTS (SELECT * FROM sys.tables WHERE [object_id] = OBJECT_ID('dbo.CharacterBlacklist'))
  DROP TABLE dbo.CharacterBlacklist
GO
CREATE TABLE dbo.CharacterBlacklist (
    CharID              int         IDENTITY,
    DisallowedCharacter nchar(1)    NOT NULL
)
GO
INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'0')
INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'1')
INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'2')
INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'3')
INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'4')
INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'5')
INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'6')
INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'7')
INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'8')
INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'9')
GO

-- ====================================
IF EXISTS (SELECT * FROM sys.objects WHERE [object_id] = OBJECT_ID('dbo.StripBlacklistCharacters'))
  DROP FUNCTION dbo.StripBlacklistCharacters
GO
CREATE FUNCTION dbo.StripBlacklistCharacters (
    @String nvarchar(100)
)
RETURNS varchar(100)
AS BEGIN
  DECLARE @blacklistCt  int
  DECLARE @ct           int
  DECLARE @c            nchar(1)

  SELECT @blacklistCt = COUNT(*) FROM dbo.CharacterBlacklist

  SET @ct = 0
  WHILE @ct < @blacklistCt BEGIN
    SET @ct = @ct + 1

    SELECT @String = REPLACE(@String, DisallowedCharacter, N'')
    FROM dbo.CharacterBlacklist
    WHERE CharID = @ct
  END

  RETURN (@String)
END
GO

-- ====================================
DECLARE @s  nvarchar(24)
SET @s = N'abc1234def5678ghi90jkl'

SELECT
    @s                  AS OriginalString,
    dbo.StripBlacklistCharacters(@s)   AS ResultString

Output

OriginalString           ResultString
------------------------ ------------
abc1234def5678ghi90jkl   abcdefghijkl

My challenge to readers: Can you make this more efficient? What about using recursion?

Rob Garrison
you could probably write a better dbo.StripBlacklistCharacters() without a loop using http://www.sommarskog.se/arrays-in-sql-2005.html#tblnum table of numbers joined into your black list table, but I'm too lazy today to try it myself....
KM
+13  A: 

Try this function:

Create Function [dbo].[RemoveNonAlphaCharacters](@Temp VarChar(1000))
Returns VarChar(1000)
AS
Begin

    While PatIndex('%[^a-z]%', @Temp) > 0
     Set @Temp = Stuff(@Temp, PatIndex('%[^a-z]%', @Temp), 1, '')

    Return @TEmp
End

Call it like this:

Select dbo.RemoveNonAlphaCharacters('abc1234def5678ghi90jkl')

Once you understand the code, you should see that it is relatively simple to change it to remove other characters, too. You could even make this dynamic enough to pass in your search pattern.

Hope it helps.

G Mastros
Slick. Much simpler than mine.
Rob Garrison
That is an awesome implementation
SQLMenace
This code removes non-alpha characters (so numbers are also removed). If you want to leave the numbers (remove non-alpha numeric characters), then... replace ^a-z with ^a-z^0-9That search string appears in the code in two different places. Be sure to replace both of them.
G Mastros
From Jeff's comment: I think that if wanted to strip all non-letters and non-numbers, you would want '^a-z0-9' (versus '^a-z^0-9', which would leave ^ in the string).
Even Mien
+1  A: 

Parameterized version of G Mastros' awesome answer:

CREATE FUNCTION [dbo].[fn_StripCharacters]
(
    @String NVARCHAR(MAX), 
    @MatchExpression VARCHAR(255)
)
RETURNS NVARCHAR(MAX)
AS
BEGIN
    SET @MatchExpression =  '%['+@MatchExpression+']%'

    WHILE PatIndex(@MatchExpression, @String) > 0
        SET @String = Stuff(@String, PatIndex(@MatchExpression, @String), 1, '')

    RETURN @String

END

Alphabetic only:

SELECT dbo.fn_StripCharacters('a1!s2@d3#f4$', '^a-z')

Numeric only:

SELECT dbo.fn_StripCharacters('a1!s2@d3#f4$', '^0-9')

Alphanumeric only:

SELECT dbo.fn_StripCharacters('a1!s2@d3#f4$', '^a-z0-9')

Non-alphanumeric:

SELECT dbo.fn_StripCharacters('a1!s2@d3#f4$', 'a-z0-9')
Even Mien
+1  A: 

This is an excellent function, but I think that if wanted to strip all non-letters and non-numbers, you would want '^a-z0-9' (versus '^a-z^0-9', which would leave ^ in the string).

Jeff
You're absolutely correct. I corrected my example.
Even Mien
A: 

I put this in both places where PatIndex is called.

PatIndex('%[^A-Za-z0-9]%', @Temp)

for the custom function above RemoveNonAlphaCharacters and renamed it RemoveNonAlphaNumericCharacters