views:

110

answers:

5

What are some reasons for choosing a case sensitive collation over a case insensitive one? I can see perhaps a modest performance gain for the DB engine in doing string comparisons. Is that it? If your data is set to all lower or uppercase then case sensitive could be reasonable but it's a disaster if you store mixed case data and then try to query it. You have to then say apply a lower() function on the column so that it'll match the corresponding lower case string literal. This prevents index usage in every dbms that I've used. So wondering why anyone would use such an option.

+2  A: 

Depends on the data you want to store. Most UNIX filesystems are databases with case-sensitive keys. YouTube videos seem to be organised with case-sensitive keys.

Most of the time you want case-insensitive searches, but clearly there are certain exceptions.

Artelius
+7  A: 

There are many examples of data with keys that are naturally case sensitive:

  • Files in a case sensitive filesystem like Unix.
  • Base-64 encoded names (which I believe is what YouTube is using, as in Artelius's answer).
  • Symbols in most programming languages.

Storing case sensitive data in case-insensitive system runs the risk of data inconsistency or even the loss of important information. Storing case insensitive data in a case sensitive system is, at worst, slightly inefficient. As you point out, if you only know the case-insensitive name of an object you're looking for, you need to adjust your query:

SELECT * FROM t WHERE LOWER(name) = 'something';

I note that in PostgreSQL (and presumably in other systems), it is a simple matter to create an index on the expression LOWER(name) which will be used in such queries.

Edmund
Indexes on expressions are kind of pokey though in PostgreSQL aren't they? Still it's a far cry better than what I've seen in MySQL when encountering this issue - trying to match on email addresses for example. We we're screwed because of case sensitivity - table scan on every query. Of course the guys that maintain that database could've lower cased all of the strings but they haven't and won't and there are millions of them so ugh. We solved it by replicating the table and switching it to case insensitive.
Khorkrak
Indexes on expressions have some limitations (e.g. requiring an immutable expression), and I guess they're not quite as efficient as simple indexes. But they work pretty well for this case. The only thing is to remember how to phrase the query to work with it: you need the exact index expression in your query somewhere.
Edmund
+1  A: 

Use a case insensitive index for your field. In most cases you don't want to manipulate the data in order to find it.

Gert G
hmm that's interesting. Is there such a thing in MySQL or is this yet another reason to switch to a real database like PostgreSQL?
Khorkrak
I'm not using MySQL, but I made some searches. You've already touched it... Collations: http://dev.mysql.com/doc/refman/5.0/en/case-sensitivity.html
Gert G
Right but doing collation coercions also renders indexes useless in MySQL for the columns involved because the engine has to apply the collation switch on each row before doing to comparison. Tried that one already :)
Khorkrak
A: 

One reason is for Content Management. Typically you will need to identify changes in content so that those changes can be reviewed, recorded and published. Case matters for human readable content. "Dave Doe" is correct. "dave doe" is plain wrong.

Case-sensitivity also matters for software developers. If you don't know the desired case-sensitivity for all your customers' systems then you may want to test case-senstivity as part of testing anyway.

dportas
A: 

I have worked on an application that involves a database with purely natural keys (i.e. 'codes') that should be case sensitive but are not necessarily so.

A lot of data would come out of the database in stored procs (with the database is doing the joins), where case sensitivity is not an issue. However some data needed to come from the database in separate queries and then be 'stitched together' in loops - primarily due to a complex data type that SQL couldn't easily work with - and this is where the problem arose. When I'm iterating two result sets and trying to join on the 'code', the values Productcode and ProductCode don't naturally match.

Rather than fixing the data, I had to change my code (C#) to do case insensitive string matching. Not throughout the entire solution, mind, just when looking through these 'codes' for matching.

If I had a case sensitive database I would've had tidier code.

Now, rather than 'why case sensitive', I'd really like to know why you'd want a case insensitive database. Is it due to laziness? I don't see any good reason that databases are case insensitive.

Kirk Broadhurst