views:

167

answers:

4

I am working on existing DB and try to optimize it. I see a table without a single primary key but with two foreign keys work as a primary key. I know it will work. however, is it better to have one single primary key with two foreign keys for better performance or primary (key1, key2) will just work as good as one?

For example:

CREATE TABLE  ABC (
     'xid' int(11),
    'yid' int (11),
PRIMAY KEY (xid, yid)
)

does it perform the same (in terms of indexing) as:

CREATE TABLE ABC (
  'id' int(11),
  'yid' int (11),
  'xid', int (11),
  PRIMARY KEY (id),
 KEY (xid, yid)
)


some updates there

so, I do some testing.. using simple queries on three different tables

Table myA, with 10,000+ records. only has userid as the primary index.

SELECT * FROM myA where userid=12345

id  select_type  table  type  possible_keys  key  key_len  ref  rows  Extra
1   SIMPLE           myA const  PRIMARY           PRIMARY  4  const  1

Table myB, is a many-to-many table, with primary id, and userid as one of two foreign keys. with over 50,000 records

SELECT * FROM myB where userid=12345

id  select_type  table  type  possible_keys  key  key_len  ref  rows  Extra
1   SIMPLE           myB  ref  userid           userid  4  const  53

Table myC, is also a many-to-many table, but with composite primary key, userid is one of the two. with over 100,000 records

SELECT * FROM myC where userid=12345

id  select_type  table  type  possible_keys  key  key_len  ref  rows  Extra
1   SIMPLE           myC ALL  NULL           NULL  NULL  NULL  101289  Using where

So, table C actually examine all 100,000 records!!! (the query only returns 50 records)

There are something that i don't understand... it seems to me composite primary isn't doing the job.


MORE..

While I'm doing more testing and "Explain", I conclude that (in MySQL), even you set composite keys as primary key. You still have to explicitly set index for all keys. Then you will enjoy indexing.

+1  A: 

Adding the id column is only worth doing if you are exposing the contents to the user. Either way, it's a typical many-to-many join table.

To change the primary key to a single column means adding a unique key constraint onto the two foreign key columns - there's no need because you get the uniqueness from defining the columns as the primary key, even if it is a composite key. A composite key is a combination of 2+ columns.

If you aren't selecting by that id column, there's no need for it.

OMG Ponies
+1  A: 

What you're describing is a composite primary key, which is acceptable and a sensible pattern. A lot of applications use an artificial primary key (usually an int or guid) when there is a perfectly acceptable candidate key already. This adds work for the database, but in some cases makes the application easier to write.

There is no real reason to add an extra primary key when there is already a candidate. I'd lean towards not doing so.

You also gain performance in some cases by not adding an unnecessary auto-generated primary key column, and it reduces the space usage of your database (albeit only slightly)

MarkR
+4  A: 

In the second example you show, the xid, yid columns are indexed but nothing prevents your application from entering the same pair of xid, yid on multiple rows of the ABC table:

INSERT INTO ABC (xid, yid) VALUES (123, 456), (123, 456); -- NO ERROR

You can get unintentional duplicates this way, and it can cause strange effects when you do joins and counts. Also if you need to update a row to change the association between a given xid and its yid, you could update one row and not the other(s).

You should at least declare the key over (xid, yid) as a UNIQUE KEY to prevent duplicates.

The first example you show uses a compound primary key (some people say composite primary key). SQL supports multi-column indexes, and multi-column constraints. There's no downside to doing this, except that if you want to run a query to pick one row out, you need to use two columns instead of one in the condition that identifies the row.

DELETE FROM ABC WHERE xid = 123 AND yid = 456;

Likewise if another table contains a foreign key to reference the ABC table, it would have to have both columns.

Enough programmers find using two columns to be so burdensome and confusing that they'd rather add a single-column surrogate key.

Insisting on a superfluous surrogate key when none is needed is something I consider to be an SQL Antipattern.


Re your updated question above: Are you aware that a compound index only helps when your search includes the left-most columns in the index? This is true of any composite index in any brand of RDBMS. Example:

CREATE TABLE myC (
  somethingid INT,
  userid      INT,
  PRIMARY KEY (somethingid, userid)
);

SELECT * FROM myC WHERE userid = 12345;

This query cannot use the primary key index.

The classic example to explain compound index usage is the telephone book analogy: If I ask you to search for everyone whose last name is "Thomas" you can use the fact that the phone book is ordered by last name to help make your search quick. But if I ask you to search for everyone whose first name is "Thomas," you have to search every page. The phone book is like a compound index on (last_name, first_name). So if your search doesn't include last_name, you must resort to a brute-force search.

You can also create an extra index just for the other column, so you can do a search using that as the criterion. You don't need an extra single-column index for the first column. The compound index is adequate.

CREATE TABLE myC (
  somethingid INT,
  userid      INT,
  PRIMARY KEY (somethingid, userid),
  KEY (userid)
);

Usually if that column is declared as a foreign key, the RDBMS should create an index automatically. However, in some versions of some RDBMS products, you have to create an index on a foreign key column yourself, as a separate action.

Bill Karwin
A: 

There's one other cool thing about primary keys which is that the data is actually ordered by the primary key on the disk. So there is a slight difference between even a unique index and a primary key when it comes to performance. It's probably not going to be much of a performance difference though, just how much time it takes to actually get the data off of the disk.

Chuck Vose