views:

116

answers:

5

I have been trying to do web scraping of a particular site and storing the results in a database. My original assumptions about the data allowed a schema where I could use fairly reasonable composite primary keys (usually containing only 2 or 3 fields) but as time went on, I realized that my original assumptions about the data were wrong and my primary keys were not as unique as I thought they were, so I have slowly been expanding them to contain more and more fields. In fact, I have recently come to believe that their database has no constraints whatsoever.

Just today, I have finally expanded my a primary key for one of my tables to contain every field in that table and I thought now would be a good time to ask: is it better to add an auto-incrementing column that is just a unique id or just leave a composite primary key on the entire table?

+4  A: 

You're better off with one primary key than using all fields as a primary key.

First, your tools will have an easier time recognizing it. I'm sure there are a half a dozen or so other reasons, but this seems like a no-brainer to me.

David Stratton
+1: Just don't use composite keys; they don't seem to solve as many problems as they cause.
S.Lott
No, you can have only one primary key, but that primary key can be a single column or multiple columns.
David Stratton
+3  A: 

Surrogate keys all the way - they're just easier to work with.

Then again, I have been playing a lot with Entity Framework and my view could be clouded by that.

Paul Smith
+1  A: 

@Jack - if you never know or find yourself adding too many composites to make a primary key only to find out that every column makes the actual row unique then you don't know enough about how the database is created. I would agree with you that just to add an incrementing auto pk to be the solution.

JonH
+1  A: 

The only time I ever use a composite key is when it consists of two integer fields in a linking table for a many to many relationship. Use a surrogate key and then put a unique index on the fields you would have put into the composite key. This way you save space to child tables, have the improved speed of an integer join (I would not use a GUID unless I was actually going to use replication) and you have the uniqueness of the natural key preserved.

HLGEM
A: 

One way to get both the uniqueness of a large composite key and the convenience of a synthetic key is to use a secure hash of the values of all the fields. Personally I would SHA1 the contents of all the fields and then BASE64 or HEX encode that and use it as my key. You get the benefits of having a single column to deal with as well as the ability to tell if the data is already in the database by hashing all the fields and just doing a simple SELECT on the Primary Key to see if it already exists.

fuzzy lollipop