ansaurus

Question

Is it crazy to bypass database case sensitivity issues by storing original string case AND lower case?

Answer 1

+1 A:

Suggest your search queries do something like this:

SELECT * FROM Users WHERE LOWER(UserName) = LOWER('fredFlinstone')
explicitly include the COLLATION hint on the query when case sensitivity should be ignored/respected

I'd consider the duplication of data for case sensitivity too onerous.

p.campbell 2010-10-25 16:26:00

Does LOWER kill the benefit of indexing?

Russ 2010-10-25 16:28:51

Most DBMS will not be able to use an index for such a query, which could be a problem. The obvious solution are function-based indices, but not all DBMS have them. Plus, the problems with different COLLATE implementations remain, as mentioned in the question.

sleske 2010-10-25 16:30:31

@Russ, in Oracle you can create a function based index so that the column itself indexed with the Lower() -- (http://www.akadia.com/services/ora_function_based_index_2.html)

tanging 2010-10-25 16:31:36

@Russ: Yes, it does, unless you create a function-based index for "LOWER(UserName)". However, not all DBMS have function-based indices (e.g. Oracle and PostgreSQL do, MySQL does not).

sleske 2010-10-25 16:31:40

@sleske - Thanks - I will look into function base indexing, even though not available across DBMS implementations. This is not something I've heard of before.

Russ 2010-10-25 16:33:31

Answer 2

+2 A:

Of course, decisions like this are always a trade-off, but I don't think this is necessarily "doubled-up data". Lowercasing a string can be a non-trivial operation, in particular if you go beyond ASCII, so the lowercased version of the string is not just "duplicate". It is somewhat related to the original string, but not more than that.

If you think of it as an analog to storing computed results in the DB, it becomes more natural.

The option of querying on UPPER(UserName) is another good solution, which avoids the second column. However, to use it you need at least a reliable UPPER function (where in particular you can control the locale that it uses for non-ASCII characters), and probably function-based indices for decent performance.

sleske 2010-10-25 16:26:20

Answer 3

+1 A:

I've often seen data duplicated in this way for performance reasons. It allows you to keep the original casing (which you'll obviously need as you're not always able to guess what the casing should be, you can't be sure that each name begins with a capital letter for example). If the database doesn't support other ways to do this (functional indexes), then this is practical, not crazy. You can keep the data consistent by using triggers.

steinar 2010-10-25 16:33:14

"practical, not crazy" is exactly how I'm viewing it. But it seems like it should be a common problem and I have the feeling I'm missing something since I have not seen this done and don't see any examples around of this.

Russ 2010-10-25 16:37:41

Well, there are some databases that just dont support functional indexes. You'll very likely have to index data like this so this will be the only option. I've seen serious applications using this kind if indexing so I would definitely say it's not crazy or out of the ordinary. Where functional indexes are supported (e.g. Oracle and Postgre), using those would probably make more sense.

steinar 2010-10-31 00:38:46

Answer 4

+1 A:

Searches on the index will be clean and never be slowed down by collation features or calls to LOWER() or anything (assuming such things slow down the index, which seems logical)

No, that's not logical. You can have indexes on constant functions.

create index users_name on users(name); -- index on name
create index users_name_lower on users(lower(name)); -- index on the function result

Your RDBMS should be smart enough to know to use users_name_lower when it gets this query:

select * from users where lower(name) = ?

Without users_name_lower, yes, that would have to walk the table. With the functional index, it does the right thing.

Andy Lester 2010-10-25 16:47:55

ansaurus

tags:

views:

answers:

Is it crazy to bypass database case sensitivity issues by storing original string case AND lower case?

related questions