views:

215

answers:

9

If I had a site where a user can flag another user post and it cannot be undone or changed, do I need to have a primary key? All my selects would be on the post_id and with a where clause to see if the user already flagged it.

+1  A: 

Best have one, if just because you may have to delete the occasional record manually (e.g. duplicates) and one should have a unique identifier for that.

Pekka
+1  A: 

The simple answer is yes. every table should have a primary key (made of at least one column). what benefit do you get for not having one?

Ehrann Mehdan
A: 

In such a situation, you might be able to get away without one, but I'd be inclined to throw a primary key on there anyway, simply because it's relatively simple to do and will save rework if the requirements change.

Adam Luchjenbroers
+2  A: 

Primary keys help speed up lookups and joins, so it's always nice to have if you can.

Tor Valamo
+2  A: 

You don't need a primary key, not even if users are going to modify rows. A primary key optimizes the performance every time you query that table though. If you think your table will grow larger than about a thousand rows or so, then setting a primary key will give a noticeable performance boost.

The only advantage in not creating a primary key really is that it means you don't have to create one, which is fair enough I suppose :-P

You could just not bother creating one for now. You can always add one later. Not a big deal. Don't let anyone bully you into thinking you absolutely must create a primary key right now! You'll see it being horribly slow soon enough :-P and then you can just add the primary key at that point. If you don't have too many duplicates by then :-P

Hugh Perkins
wouldnt it take up several bytes? i dont see how a PK can increase speed in this case
acidzombie24
+4  A: 

A primary key has nothing to do with whether data can be changed - it's a single point of reference for an entire row, which can make looking up and/or changing data faster.

All my selects would be on the post_id and with a where clause to see if the user already flagged it.

You need to provide more information about business rules. For example, should the system support more than one user flagging the same post?

If the answer is "no", then I would model a POST_STATUS_CODE table and have a foreign key to the table in your POSTS table.

If the answer is "yes", then I would still have a POST_STATUS_CODE table but also a table linking the POSTS and POST_STATUS_CODE tables - say POSTS_STATUS_XREF.

I have a post_flag table with post_id, user_id (who flagged it) and flag_type (ATM as a byte). I don't see how PK will make it faster in this case but I imagine it will take up 4 or 8 bytes per row. I was thinking about indexing post_id. If I do should I still create a PK?

At a minimum, I would make the primary key to be a combination of:

  • post_id
  • user_id

The reason being that a primary key ensures that there can't be duplicates.

A primary key can be made up of more than one column - this is called a compound key. It means that the pair of values is unique. IE: You can't have more than one combination of 1, 1 values, but you could have 1,2, 1,3, etc (and vice versa). Attempts to add duplicates will result in duplicate primary key errors.

OMG Ponies
i have a post_flag table with post_id, user_id (who flagged it) and flag_type (ATM as a byte). I dont see how PK will make it faster in this case but i imagine it will take up 4 or 8 bytes per row. I was thinking about indexing post_id. If i do should i still create a PK?
acidzombie24
an edit. cool. "At a minimum, I would make the primary key to be a combination of" what do you mean by make to be combination of? If i had a PK i may insert post_id and user_id and not need to check if the same entry exist already? (Which is reason enough to have it but i was going to write a where statement to ensure it doesnt exist)
acidzombie24
nevermind i can find out for myself ;)
acidzombie24
A: 

The software requirements may change rapidly. The customer may introduce new requirements. So having a primary key may be useful because you can eliminate totally unnecessary data migrations in such a situations.

Chathuranga Chandrasekara
A: 

Read this: "Is it OK not to use a Primary Key When I don’t Need one?"

Yes, you do need a primary key.

You may as well use text files for storage if you don't think you do because it means you don't understand them...

gbn
+9  A: 

It seems to me from some of your other posts that the reason you are trying to avoid adding a primary key to your table is to save space.

Stop thinking like that.

It's a bad idea to make non-standard optimizations like this without having tested them first to see if they actually work. Have you run some tests that shows that you save a significant amount of space in your database by omitting the primary key on this table? Or are you just guessing?

Using a primary key doesn't necessarily mean that you will use more space. Depending on the database, if you omit the primary key it might add a hidden field for you anyway (for example if you don't have a PK in MySQL/InnoDB it adds a hidden clustered index on a synthetic column containing 6 byte row ID values (source)). If you do use a primary key, rather than adding a new column you can just choose some existing columns that you know should be unique anyway. It won't take up any more space, it will just mean that the data will be stored in a different order to make it easier to search.

When you add an index, that index is going to take up extra space, as an index is basically just a copy of a few columns of the table, plus a link back to the row in the original table. Remember that hidden column the database uses when you don't have a PK? Well now it has to use that to find your rows, so you'll get a copy of it in your index too. If you use a primary key then you probably don't need one of your indexes that you would have added, so you're actually saving space here.

Besides all this, some useful database tools just won't work well if you don't have a primary key on your table. You will annoy everyone that has to maintain your database after you are gone.

So tell me, why do you think it's a good idea to NOT have one?

Mark Byers
haha, great answer. You actually -answered- the question
acidzombie24
Great Answer. To add, I would say that his primary key's would actually be user_id + post_id via an M:N table anyway. With indexing on the 2 FK's, the lookups will be very fast in either direction. So the question becomes, performance or space? Which is really going to cost you the most in today's computing?
Kevin Peno