tags:

views:

72

answers:

1
Users table
 user_id
 pic_url
 name

friends table
 auto_id
 userid
 friendid
 status

actions table
 auto_id
 userid
 type
 subject
 body
 datetime

I want to make a friend stream of updates thats shows updates, could be a blog post, status change, anything but should only show the ones that are from a friend of the logged in user

Here is what I came up with but my userbase is very large so performance is a must, is there a better way of doing this? Please show me

SELECT u.user_id, u.pic_url, u.name, a.auto_id, a.userid, a.type, a.subject, a.body, a.datetime
FROM actions AS a
LEFT JOIN users AS u ON u.auto_id=a.userid
LEFT JOIN friends AS f ON f.userid=a.userid
WHERE f.friendid=1 //1 would be my user ID 
AND f.status=active

Please help I don't think this is correct.

Lets say there is 50,000 users and my userid is #1 and I am friends with 20,000 users, it should return all entries in the actions table that is published by a user I am friends with, also need to modify to include actions from myself

I have heard some people talki about using some sort of hash table for faster lookups would something like that be possible here?

Thanks for any help

+3  A: 

I have heard some people talki about using some sort of hash table for faster lookups would something like that be possible here?

It's called an index, and you should add one to each column that you're planning on using a JOIN (or matching with an explicit constraint like >, >=, =, <=, < or an IN () clause which matches only items in a stated list). This way the database server can jump right to the correct entries in the index, rather than having to do a brute force search through all of the table rows. It's exactly like an index in a book. If you wanted to find the pages in a book in which the name "Knuth" appears, you have two choices. If the book has an index, you look in the index and hope the name is there. If the book does not have an index, you'll just have to read through the whole thing yourself, and that will take much longer.

If you care about ordering/sorting (or doing any kind of relative numeric/string comparison), it should be a sorted index. Otherwise it can be a hashtable index, which is faster for tables with lots of rows, but carries no sorting information. These types of details are likely to have different syntaxes/options depending on which type of database server software is used.** (see note below)

Note that primary keys already have an automatically generated index, so you don't have to add one yourself. Note also that if you have a multiple-column primary key, e.g. (State, City, Zipcode) then there will effectively be indices on the leftmost subsets of the primary key, e.g. you get an index on State, and (State,City), and (State,City,Zipcode) for free, but if you want to JOIN on Zipcode or City or (City,Zipcode) then you need to create your own indices in addition to those provided by the primary key.

In your case, it looks like you should have indices on these columns (I've *-ed the columns I am assuming are already primary keys). Unless you have any significance on the numerical order of your user IDs, those would be good candidates for hashtable indices.

Users.user_id*
Friends.user_id
Friends.friend_id
Friends.active
Actions.user_id


**For MySQL, you add a clause to the CREATE INDEX statement that says USING HASH for a hashtable index, or USING BTREE (for a sorted index)... ignore RTREEs as those are for spatial data. Note also that MySQL doesn't allow HASH indices on the common storage engines InnoDB and MyISAM. Really large datasets that need high-performance probably need to have data mirrored on an in-memory table with a HASH index. With 50,000 rows you probably don't need to worry about it; a BTREE's search time is O(log n) whereas HASH is O(1) and there's probably not that much difference. BTREEs are very wide and are designed not to be deep; to require a single additional comparison in the search step, you may need to increase the # of rows by a factor of 10 or 100.

Jason S
More reading on indexes http://www.alistapart.com/articles/indexing-the-web-its-not-just-googles-business/
Nico Burns
I should of mentioned this will be mysql, Is there a special way of adding a hash index or is a regular index the same thing?
jasondavis
see the note I added. I probably shouldn't have gone into that much detail. The difference in performance between BTREE and HASH indices is likely to be much smaller than the difference in performance between BTREE and no index.
Jason S