views:

590

answers:

2

We have a legacy database that uses strings as primary keys. I want to implement objects on top of that legacy database to better implement some business logic and provide more functionality to the user.

I have read in places that using strings for primary keys on tables is bad. I'm wondering why this is? Is it because of the case-sensitivity issues? character sets?

... why is this particularly bad for NHibernate?

... and following up on that ... if strings do make bad primary keys, is it worth it to replace the primary keys in the database with ints or GUIDs or the like? (we only have about 25-30 tables involved)

+5  A: 

Okay, I will have a stab at this. I will give a couple of quick caveats - I am not an expert on databases and my experience is with Hibernate (Java) rather than NHibernate, but here goes.

I think the issue of primary keys as strings is to do with the SQL data-type that is used to represent them in the database. Because the primary key is used all the time when inserting, querying and so on, the database engine has to spend lots of time comparing primary keys. If you are using numbers, these are simply stored as bytes which computers are really good at doing stuff with quickly. As soon as you start using strings, the cost of these operations (comparisons mainly) goes up significantly. Even if the database engine is using really neat strategies to compare keys, it will still always be faster to compare bytes as bytes rather than strings.

On modern hardware though, this is becoming much less an issue than it used to be, and with indexes the problem almost disappears.

I don't know for sure about why this is really bad in Hibernate (and NHibernate) but in my experience, because my application has a complex graph of objects that often have references to other persisted objects, often as lists or sets, the references are all stored using the ID of the other object, and because of the rules I have in place for cascading saves, fetching and so on, this will mean that the primary keys are being used ALL the time. Hibernate - which I quite like - tends to do exactly what its told to, and sometimes people (especially me!) tell it to do really dumb things. As a result, even seemingly simple updates or queries end up generating quite complex SQL.

So - in summary - strings as primary keys are bad due to cost of simple operations on them and using Hibernate may magnify this. In practice though, modern database engines have lots of neat strategies to ensure that the performance hit is not that bad. (Postgres - and presumably others - by default create indexes for primary keys)

For your follow up - should you replace your keys? Well, that depends on the performance of your application. If performance is critical, then for a high volume and very intensive application it may be a good idea, otherwise there will probably be minimal benefit, with the downside of having to spend time changing all your tables. You could expect to get much better results refining the strategies you are using with NHibernate (ie fetching strategies and when you are cascading saves and so on).

andy K
+1  A: 

Andy K seems to imply that strings are not stored as bytes. That would be funny! In fact it all depends on how long the string PK is and what collation you use. It might be even faster than bigint or int identity and will almost definitely be faster than Guids. If these strings are something you'd have to search by anyway, then you would need an index (perhaps even clustered index) on them anyway, so why not make them PKs!

zvolkov