tags:

views:

294

answers:

2

I am trying to query something like

select emp_id from dept.employee where
  firstname like '%sam%' or 
  firstname like '%SAM%' or
  firstname like '%will%' or
  firstname like '%WILL%'

Is it possible to put it in regex something like

select emp_id from dept.employee where
  firstname like '%sam|SAM|will|WILL%'

or

select emp_id from dept.employee where
  upper(firstname) like '%sam%' or
  upper(firstname) like '%will%'

I am using DB2 UDB9.

Cheers, Prateek

+1  A: 

Unfortunately, there is no immediate Regex function available in DB2. But it is possible to have an external user-defined function (calling either an external CLI, .NET or Java function) that implements regular expressions. Here's a ready-to-use example by IBM:

http://www.ibm.com/developerworks/data/library/techarticle/0301stolze/0301stolze.html

DrJokepu
+1  A: 

That's either a bad example or a bad database design :-).

You should probably not be storing first (or any) names in any way that will require you to use LIKE. You need to keep in mind that tables are almost always read far more often than written, and design accordingly.

That means you want the cost to be on insertion or update, not on selecting. Per-row functions such as upper(name) never scale well to proper enterprise-class databases.

In my opinion, you should have, for DB2, the following:

  • an insert/update trigger that will remove leading and trailing spaces from the first (and last) name.
  • a generated column that will uppercase the names (this uses more storage but that's usually better than wasting time). I'm not sure if UDB9 has generated columns (DB2/z has) but you can do this in the same insert/update trigger. Basically it's an extra column that's always set to the uppercase version of another field.
  • an index on the generated column, not the original column.

That way, your selects will scream along with queries like (big and ugly, but efficient):

select * from tbl
where generatedname = 'SAM'
or    generatedname = 'SAMUEL'
or    generatedname = 'SAMANTHA'
or    generatedname = 'WILL'
or    generatedname = 'WILLIAM'
or    generatedname = 'WILLOMENA'

or (less big and ugly, just as efficient, and closer to the original query in intent):

select * from tbl
where generatedname like 'SAM%'
or    generatedname like 'WILL%'

using the full power of the query optimizer (DB2, and other DBMS' I would think, can still optimize 'XX%' easily if the field is indexed).

I'm not a big fan of using LIKE for any decent sized tables although sometimes there's not much choice. I can't think of any viable situation in which you'd want to look for "%SAM%" and doing so results in an inability to use the optimizer to its fullest extent.

paxdiablo
that was a bad example this is for a more less queried field more of description like field for a record.
prateek urmaliya