ansaurus

Question

Answer 1

+3 A:

I have heard some people talki about using some sort of hash table for faster lookups would something like that be possible here?

It's called an index, and you should add one to each column that you're planning on using a JOIN (or matching with an explicit constraint like >, >=, =, <=, < or an IN () clause which matches only items in a stated list). This way the database server can jump right to the correct entries in the index, rather than having to do a brute force search through all of the table rows. It's exactly like an index in a book. If you wanted to find the pages in a book in which the name "Knuth" appears, you have two choices. If the book has an index, you look in the index and hope the name is there. If the book does not have an index, you'll just have to read through the whole thing yourself, and that will take much longer.

If you care about ordering/sorting (or doing any kind of relative numeric/string comparison), it should be a sorted index. Otherwise it can be a hashtable index, which is faster for tables with lots of rows, but carries no sorting information. These types of details are likely to have different syntaxes/options depending on which type of database server software is used.** (see note below)

Note that primary keys already have an automatically generated index, so you don't have to add one yourself. Note also that if you have a multiple-column primary key, e.g. (State, City, Zipcode) then there will effectively be indices on the leftmost subsets of the primary key, e.g. you get an index on State, and (State,City), and (State,City,Zipcode) for free, but if you want to JOIN on Zipcode or City or (City,Zipcode) then you need to create your own indices in addition to those provided by the primary key.

In your case, it looks like you should have indices on these columns (I've *-ed the columns I am assuming are already primary keys). Unless you have any significance on the numerical order of your user IDs, those would be good candidates for hashtable indices.

Users.user_id*
Friends.user_id
Friends.friend_id
Friends.active
Actions.user_id

**For MySQL, you add a clause to the CREATE INDEX statement that says USING HASH for a hashtable index, or USING BTREE (for a sorted index)... ignore RTREEs as those are for spatial data. Note also that MySQL doesn't allow HASH indices on the common storage engines InnoDB and MyISAM. Really large datasets that need high-performance probably need to have data mirrored on an in-memory table with a HASH index. With 50,000 rows you probably don't need to worry about it; a BTREE's search time is O(log n) whereas HASH is O(1) and there's probably not that much difference. BTREEs are very wide and are designed not to be deep; to require a single additional comparison in the search step, you may need to increase the # of rows by a factor of 10 or 100.

Jason S 2009-07-24 00:33:28

More reading on indexes http://www.alistapart.com/articles/indexing-the-web-its-not-just-googles-business/

Nico Burns 2009-07-24 00:36:21

I should of mentioned this will be mysql, Is there a special way of adding a hash index or is a regular index the same thing?

jasondavis 2009-07-24 00:58:12

see the note I added. I probably shouldn't have gone into that much detail. The difference in performance between BTREE and HASH indices is likely to be much smaller than the difference in performance between BTREE and no index.

Jason S 2009-07-24 01:13:18

ansaurus

tags:

views:

answers:

need help with mysql joins

related questions