ansaurus

Question

Optimizing a simple query on two large tables

Answer 1

+1 A:

As far as I know, the best way to make a query like that "lightning fast", is to create a summary table that tracks friend page views per page per creator.

You would probably want to keep it up-to-date with triggers. Then your aggregation is already done for you, and it is a simple query to get the most viewed pages. You can make sure you have proper indexes on the summary table, so that the database doesn't even have to sort to get the most viewed.

Summary tables are the key to maintaining good performance for aggregation-type queries in read-mostly environments. You do the work up-front, when the updates occur (infrequent) and then the queries (frequent) don't have to do any work.

If your stats don't have to be perfect, and your writes are actually fairly frequent (which is probably the case for something like page views), you can batch up views in memory and process them in the background, so that the friends don't have to take the hit of keeping the summary table up-to-date, as they view pages. That solution also reduces contention on the database (fewer processes updating the summary table).

nathan 2009-06-17 01:46:53

Thanks for the suggestion Nathan. Writes are pretty frequent - about 100 per read. There may be some ways I can track only a limited set though. I'll give it some thought.

Tim Haines 2009-06-17 02:09:39

Are the writes happening from multiple threads? If so, I would definitely consider aggregating them in memory, and having a single thread/connection perform all the writes. Then you won't have many connections contending, writing to the same tables.

nathan 2009-06-17 02:52:03

Answer 2

A:

Hi Tim,

You should absolutely look into denormalizing this table. If you create a separate table that maintains the user id's and the exact counts for every page they viewed your query should become a lot simpler.

You can easily maintain this table by using a trigger on your views table, that does updates to the 'views_summary' table whenever an insert happens on the 'views' table.

You might even be able to denormalize this further by looking at the actual relationships, or just maintain the top x pages per person

Hope this helps,

Evert

Evert 2009-06-17 01:49:00

Hi Evert, thanks for your answer. It's a little trickier than maintaining a view count for each page per user. The reason it's trickier is because I'm counting the number of friends that have viewed the page - not the total number of users.

Tim Haines 2009-06-17 05:57:49

Yea I realize this is not the final solution, but this will already simplify your query greatly. At this point you can maintain cached lists of your users and their top pages and sort within your frontend layer without touching your database at all.

Evert 2009-06-18 15:07:15

Answer 3

A:

Your indexes look correct although if friendship has very big rows, you might want the index on (creator_id, receiver_id) to avoid reading all of it.

However something's not right here, why are you doing a filesort for 271 rows? Make sure that your MySQL has at least a few megabytes for tmp_table_size and max_heap_table_size. That should make the GROUP BY faster.

sort_buffer should also have a sane value.

2009-06-18 00:46:18

tmp_table_size is 33 mb, max_heap_table_size is 16mb, sort_buffer is 2mb. I agree something's not right. Ruby handles the group and sort much faster than mysql at the moment.

Tim Haines 2009-06-18 01:07:45

So that's an effective tmp_table_size of 16MB (it's min(tmp_table_size, max_heap_table_size)) which is enough for 25K integers. Sending all that data to Ruby can't be good, have you tried it only with the GROUP BY?

2009-06-18 07:46:52

An integer takes 640 bytes? Don't you mean 250k integers? I'll try increasing the tmp_table_size significantly and see if it has an impact.

Tim Haines 2009-06-18 08:34:48

Read 'enough' as 'sufficient'. An integer should only take 4-8 bytes.

2009-06-18 14:18:55

ansaurus

tags:

views:

answers:

Optimizing a simple query on two large tables

related questions