views:

393

answers:

4

I'm having an issue extracting a substring in SQL query results.

Here's the situation: I have a column that contains strings in the following format "ax123456uhba", "ax54232hrg", "ax274895rt", "ax938477ed1", "ax73662633wnn2"

I need to extract the numerical string that is preceded and followed by letters. However, occasionally there is a number in the trailing string that I don't need. The length of the trailing characters is not static so I can't just do a simple substring function.

I'm not necessarily asking for completed code, just a helpful push in the right direction if possible.

Thanks in advance for your help.

A: 

If you're using .NET, you could grab it using a regex:

var input = "ax938477ed1";
var reg = new Regex("[0-9]+");
var match = reg.Match(input);
int number = -1;
if (match.Success)
    number = Convert.ToInt32(match.Groups[0].Value);

This will store 938477 in number.

Andomar
A: 

Probably using a regex would be the easiest.

Depends on the database - some have regex functions - (SQL Server looks like it can be added to the server Untested MSDN article

Otherwise you can cut down the query by using like. Sybase allows where x like '%[0-9]&' to find rows with a number in it then use a regex in the client.

Mark
+1  A: 

It looks like PATINDEX is what you need.

Returns the first index of a pattern found in a string - expects regular expression see this -> http://blog.sqlauthority.com/2007/05/13/sql-server-udf-function-to-parse-alphanumeric-characters-from-string/

Here's the code copied here to strip out alphanumeric characters from a string - it shouldn't take too long to change this to strip out first contiguous series of digits from a string.

CREATE FUNCTION dbo.UDF_ParseAlphaChars
(
@string VARCHAR(8000)
)
RETURNS VARCHAR(8000)
AS
BEGIN
    DECLARE @IncorrectCharLoc SMALLINT
    SET @IncorrectCharLoc = PATINDEX('%[^0-9A-Za-z]%', @string)

    WHILE @IncorrectCharLoc > 0
    BEGIN
        SET @string = STUFF(@string, @IncorrectCharLoc, 1, '')
        SET @IncorrectCharLoc = PATINDEX('%[^0-9A-Za-z]%', @string)
    END

    SET @string = @string

    RETURN @string
END
GO
Doctor Chris Chris
Excellent find! Thanks a million!
Jon Ownbey
For the record, this UDF will remove all letters. It would translate "a9d1" to 91.
Andomar
A: 

I agree with using RegEx's for this, assuming you're on SQL 2005 or 2008 where you can use the CLR. Here are is some UDF code for using RegEx's in SQL Server that should be helpful:

Imports System
Imports System.Data
Imports System.Data.SqlClient
Imports System.Data.SqlTypes
Imports System.Text
Imports System.Text.RegularExpressions
Imports Microsoft.SqlServer.Server

Partial Public Class UserDefinedFunctions

    <Microsoft.SqlServer.Server.SqlFunction()>
    Public Shared Function IsRegexMatch(ByVal input As SqlString, ByVal pattern As SqlString) As SqlBoolean
        If input.IsNull OrElse pattern.IsNull Then Return SqlBoolean.Null
        Return Regex.IsMatch(input.Value, pattern.Value, RegexOptions.IgnorePatternWhitespace Or RegexOptions.Singleline Or RegexOptions.Multiline)
    End Function

    <Microsoft.SqlServer.Server.SqlFunction()>
    Public Shared Function RegexReplace(ByVal input As SqlString, ByVal pattern As SqlString, ByVal replacement As SqlString) As SqlString
        If input.IsNull OrElse pattern.IsNull OrElse replacement.IsNull Then Return SqlString.Null
        Return Regex.Replace(input.Value, pattern.Value, replacement.Value, RegexOptions.IgnorePatternWhitespace Or RegexOptions.Singleline Or RegexOptions.Multiline)
    End Function
End Class
mattmc3