I'm implementing a database where several tables have string data as candidate keys (eg: username) and will be correspondingly indexed. For these fields I want:
Case insensitivity when someone queries the table on those keys
The initially written case to be preserved somehow so that the application can present the data to the user with the original case used
I also want the database schema to be as database independent as possible, as the application code is (or should not be) not slaved to a particular RDBMS.
Also worth noting is that the vast majority of queries done on the database will be done by the application code, not via direct table access by the client.
In implementing this, I'm running into a lot of annoying issues. One is that not all RDBMS implement COLLATE (which is where cases sensitivity appears to be tunable at schema level) in the same way. Another issue is that the collation and case sensitivity options can be set at multiple levels (server, database, table (?), column) and I can't guarantee to the application what setting it will get. Yet another issue is that COLLATE itself can get hairy because there is a heck of a lot more in there than simply case sensitivity (eg: unicode options).
To avoid all of these headaches, what I'm considering is dodging the issue altogether by storing two columns for one piece of data. One column with the original case, another dropped to lower case by the application layer.
eg: Two of the fields in the table
user_name = "fredflintstone" (a unique index on this one) orig_name = "FredFlintstone" (just data... no constraints)
The pros and cons of this as I see it are:
Pros:
No ambiguity - the application code will manage the case conversions and I never need to worry about unit tests failing "mysteriously" when the underlying RDBMS/settings changes.
Searches on the index will be clean and never be slowed down by collation features or calls to LOWER() or anything (assuming such things slow down the index, which seems logical)
Cons:
Extra storage space required for the doubled-up data
It seems a bit brutish
I know it will work, but at the same time it smells wrong.
Is it insane/pointless to do this? Is there something I don't know that makes the case sensitivity issue less tricky than it seems to me at the moment?