I have a query where I am searching against a string:


Now, this works fine, but it doesn't scale well, and I need to optimize it. I have found an option along the lines of creating a generated view, or something like that, but I was hoping for a simpler solution using an index.

We are using DB2, and I really want to use an expression in an index, but this option seems to only be available on z/OS, however we are running Linux. I tried the expression index anyways:

CREATE INDEX city_upper_name_idx

But of course, it chokes on the UPPER(name).

Is there another way I can create an index or something similar in this manner such that I don't have to restructure my existing queries to use a new generated view, or alter my existing columns, or any other such intrusive change?

EDIT: I'm open to hearing solutions for other databases... it might carry over to DB2...

+5  A: 

You could add an indexed column holding a numerical hash key of the city name. (With duplicates allowed).

Then you could do a multiclause where:

hash = [compute hash key for 'SAN FRANCISCO']

SELECT county FROM city WHERE cityHash = hash AND UPPER(name) = 'SAN FRANCISCO';

Alternatively, go through your db manual and look at the options for creating table indexes. There might be something helpful.

+1  A: 

Oracle supports function-based indexes. Their canonical example:

 create index emp_upper_idx on emp(upper(ename));
Mark Harrison
Unfortunately, DB2/LUW doesn't have that yet, but it's probably coming since DB2/z does have it.

PostgreSQL also supports indexing the results of a function:

CREATE INDEX mytable_lower_col1_idx ON mytable (lower(col1));

The only other option I can think of is to de-normalize your data a bit by creating another column to hold the upper-case version (updated by triggers) and index that. Blech!

Don't need triggers and such, DB2 supports generated columns.
+1  A: 

I don't know whether this would work in DB2, but I'll tell you how I'd do this in SQL Server. I think the way MSSQL does this is ANSI standard, though the specific collation strings may differ. Anyway, if you can do this without trashing the rest of your application -- are there other places where the "name" column needs to be case-sensitive? -- try making that whole column case-insensitive by changing the collation, then index the column.

ALTER TABLE city ALTER COLUMN name nvarchar(200) 
    COLLATE SQL_Latin1_General_CP1_CI_AS

...where "nvarchar(200)" stands in for whatever's your current column data type. The "CI" part of the collation string is what marks it as case-insensitive in MSSQL.

To explain... my understanding is that the index will store values in the order of the indexed column's collation. Making the column's collation be case-insensitive would make the index store 'San Francisco', 'SAN FRANCISCO', and 'san francisco' all together. Then you should just have to remove the "UPPER()" from your query, and DB2 should know that it can use your index.

Again, this is based solely on what I know about SQL Server, plus a couple minutes looking at the SQL-92 spec; it may or may not work for DB2.

+1  A: 

DB2 isn't strong regarding collation. And it doesn't have function-based indexes.

Niek Sanders's suggestion would work, if you can accept that the hashing has to happen in your application (as DB2 doesn't have SHA or MD5 functions, as far as I know).

However, if I were you, I'd create a materialized view (MQT == Materialized Query Table, in db2 parlance) using CREATE TABLE AS, adding a column with a pre-computed upper-case variant of the name. Note: You may add indexes to materialized views in DB2.

Troels Arvin
It's cheaper in terms of both storage and speed just to add another generated column to the existing table rather than have a whole other table. And I dispute that DB2 is weak with collation. And, your sister's ugl... sorry, got carried away :-).
+2  A: 

Short answer, no.

Long answer, yes if you're running on the mainframe, but you're not, so you have to use other trickery.

DB2 (as of DB2/LUW v8) now has generated columns so you can:

    lname  VARCHAR(20),
    fname  VARCHAR(20),

and then create an index on ulname. I'm not sure you're going to get it simpler than that.

Before that, you used to have to use a combination of insert and update triggers to ensure the ulname column was kept in sync, and this was a nightmare to maintain. Also, now that this functionality is part of the core DBMS, it's been highly optimized (it's much faster than the trigger-based solution) and doesn't get in the way of real user triggers, so no extra DB objects to maintain.

See here for details.