What are some reasons for choosing a case sensitive collation over a case insensitive one? I can see perhaps a modest performance gain for the DB engine in doing string comparisons. Is that it? If your data is set to all lower or uppercase then case sensitive could be reasonable but it's a disaster if you store mixed case data and then try to query it. You have to then say apply a lower() function on the column so that it'll match the corresponding lower case string literal. This prevents index usage in every dbms that I've used. So wondering why anyone would use such an option.
Depends on the data you want to store. Most UNIX filesystems are databases with case-sensitive keys. YouTube videos seem to be organised with case-sensitive keys.
Most of the time you want case-insensitive searches, but clearly there are certain exceptions.
There are many examples of data with keys that are naturally case sensitive:
- Files in a case sensitive filesystem like Unix.
- Base-64 encoded names (which I believe is what YouTube is using, as in Artelius's answer).
- Symbols in most programming languages.
Storing case sensitive data in case-insensitive system runs the risk of data inconsistency or even the loss of important information. Storing case insensitive data in a case sensitive system is, at worst, slightly inefficient. As you point out, if you only know the case-insensitive name of an object you're looking for, you need to adjust your query:
SELECT * FROM t WHERE LOWER(name) = 'something';
I note that in PostgreSQL (and presumably in other systems), it is a simple matter to create an index on the expression LOWER(name)
which will be used in such queries.
Use a case insensitive index for your field. In most cases you don't want to manipulate the data in order to find it.
One reason is for Content Management. Typically you will need to identify changes in content so that those changes can be reviewed, recorded and published. Case matters for human readable content. "Dave Doe" is correct. "dave doe" is plain wrong.
Case-sensitivity also matters for software developers. If you don't know the desired case-sensitivity for all your customers' systems then you may want to test case-senstivity as part of testing anyway.
I have worked on an application that involves a database with purely natural keys (i.e. 'codes') that should be case sensitive but are not necessarily so.
A lot of data would come out of the database in stored procs (with the database is doing the joins), where case sensitivity is not an issue. However some data needed to come from the database in separate queries and then be 'stitched together' in loops - primarily due to a complex data type that SQL couldn't easily work with - and this is where the problem arose. When I'm iterating two result sets and trying to join on the 'code', the values Productcode
and ProductCode
don't naturally match.
Rather than fixing the data, I had to change my code (C#) to do case insensitive string matching. Not throughout the entire solution, mind, just when looking through these 'codes' for matching.
If I had a case sensitive database I would've had tidier code.
Now, rather than 'why case sensitive', I'd really like to know why you'd want a case insensitive database. Is it due to laziness? I don't see any good reason that databases are case insensitive.