tags:

views:

250

answers:

3

Suppose there is a table "A" with 2 columns - ID (INT), DATA (VARCHAR(100)). Executing "SELECT DATA FROM A" results in a table looks like:

DATA
---------------------
Nowshak 7,485 m
Maja e Korabit (Golem Korab) 2,764 m
Tahat 3,003 m
Morro de Moco 2,620 m
Cerro Aconcagua 6,960 m (located in the northwestern corner of the province of Mendoza)
Mount Kosciuszko 2,229 m
Grossglockner 3,798 m
// the DATA continues...
---------------------

How can I extract only the numerical data using some kind of string processing function in the SELECT SQL query so that the result from a modified SELECT would look like this:

DATA (in INTEGER - not varchar)
---------------------
7485
2764
3003
2620
6960
2229
3798
// the DATA in INTEGER continues...
---------------------

By the way, it would be best if this could be done in a single SQL statement. (I am using IBM DB2 version 9.5)

Thanks :)

+2  A: 

In Oracle:

SELECT TO_NUMBER(REGEXP_REPLACE(data, '[^0-9]', ''))
FROM a

In PostgreSQL:

SELECT CAST(REGEXP_REPLACE(data, '[^0-9]', '', 'g') AS INTEGER)
FROM a

In MS SQL Server and DB2, you'll need to create UDF's for regular expressions and query like this.

See links for more details.

Quassnoi
I'm sorry. I forgot to mention that I am using DB2 version 9.5.
natch3z
never knew that you can use regex in sql +1
roman m
can you post one for MSSQL Server?
roman m
A: 

Doing a quick search on line for DB2 the best inbuilt function I can find is Translate It lets you specify a list of characters you want to change to other characters. It's not ideal, but you can specify every character that you want to strip out, that is, every non numeric character available...

(Yes, that's a long list, a very long list, which is why I say it's not ideal)

TRANSLATE('data', 'abc...XYZ,./\<>?|[and so on]', ' ')

Alternatively you need to create a user defined function to search for the number. There are a few alternatives for that.

  1. Check each character one by one and keep it only if it's a numeric.

  2. If you know what precedes the number and what follows the number, you can search for those and keep what is in between...

Dems
A: 

To elaborate on Dems's suggeston, the approach I've used is a scalar user-defined function (UDF) that accepts an alphanumeric string and recursively iterates through the string (one byte per iteration) and suppresses the non-numeric characters from the output. The recursive expression will generate a row per iteration, but only the final row is kept and returned to the calling application.

Fred Sobotka