views:

59

answers:

4

How could I improve this query? Please tell me all my options here as my social network DB is only getting bigger

This Query took 2.1231 sec

SELECT friend_friend.friendid, friend_reg_user.disp_name, friend_reg_user.pic_url, friend_reg_user.online
FROM friend_friend
INNER JOIN friend_reg_user ON friend_friend.friendid = friend_reg_user.auto_id
WHERE userid =1
AND friend_friend.status =1
ORDER BY autoid DESC 
LIMIT 59535 , 15


#####################################################################################################################################
# id # select_type  # table           # type   # possible_keys  # key     # key_len  # ref                     # rows  # Extra      #
#####################################################################################################################################
#  1 # SIMPLE       # friend_friend   # ref     # userid        # userid  # 5        # const                   # 59843 # Using where#
#  1 # SIMPLE       # friend_reg_user # eq_ref # PRIMARY        # PRIMARY # 4        # friend_friend.friendid  # 1     #            #
#####################################################################################################################################

What are my options when this table is say a million, or even 2 million rows big? This table is used to determine who is a users friends

+2  A: 

I know a programmer who is working with 8 million records in his Database, and it really doesn't change the speed that much. It is just about creating the right indexes and making sure that you are grabbing the data in an efficient way. (Numerical IDs for relationships are really useful)

Also, Your Query is really barebones for the most part. Nothing too fancy. It might just be your server latency.

Chacha102
Yeah I thought it was optimized as much as it could be, all the right indexes and stuff but over 2 seconds is slow. This is ran from localhost maybe that could be why it's slow?
jasondavis
8 million records isn't really all that many... try seeing what happens when you reach 1billion.
MarkR
I can attest to that, having 150+ million row myisam table, quieries are still snappy when you do queries that can use indexes efficiently.
nos
+1  A: 

Maybe I don't really understand your schema, but do you really need a LEFT JOIN ? Could you not use an INNER JOIN ?

(I've often heard it might be better for performances, as it returns less lines ; in your case, if you want friends of one guy, I don't see the point of a left join : friends would be "linked", and, so, have an entry in the "linking" table, no ? )

Also, make sure you have indexes on the fields used :

  • in conditions (either "where" or "join" ones) ; seems OK here ?
  • for sorting ; does autoid have an index ?

MySQL is used with really big tables in some applications, and can answer really fast if indexes / configuration is OK ; so, there is definitly something that we should be able to do here ;-)

As a sidenote : you are prefixing almost all field's names by the name of the table (because of duplicates in fields' names, I suppose) ; why don't you always do that ? It would make the query just a bit easier to understand ;-)

Pascal MARTIN
Hi, actually the time posted 2.1231 seconds was with an Inner JOIN I forgot to update it on here the left join time was around 2.4231 so there was a minor improvement. And yes there are indexes on all thye right columns, and autoid that is sorted is a primary key so it can't have an index right? I mean primary key is an index?I think I have it optimized the best it can be but 2 whole seconds is rather slow I think =(
jasondavis
ergh, too bad if there are all necessary indexes :-( (and yep, PK is an index too). next step would be denormalization ( http://en.wikipedia.org/wiki/Denormalization ) or Sharding ( http://en.wikipedia.org/wiki/Sharding )... but kinda makes things harder...
Pascal MARTIN
+1  A: 

As long the columns in your WHERE clause are indexes, you should be okay. I would generate a heft set of test data and run some benchmarks.

Also, more importantly, acquaint yourself with MySQL's EXPLAIN syntax. It will help you to determine how many rows are actually being used in the query (amongst other things), and is a great tool for optimizing queries and table indexes.

Justin Johnson
A: 

You should find out what is causing it to be slow.

Does your database fit in memory? If not, get more - no, really. Disc is slow, no matter how you look at it.

If your query absolutely HAS to use disc (say your database is just FAR too big for reasonable memory, 100G+ say), then you should be trying to minimise the number of IO operations it requires.

In practice this means a certain amount of denormalisation (do you really need a join? Can you not store (copies of) all the needed fields on the xref table?), and judicious use of covering indexes.

In InnoDB (I assume you're using Innodb here), the primary key is clustered. This means that queries which use the primary key do fewer IOs than other indexes (because the index is stored with the data in the same pages), because they don't need to do a potentially separate IO for each row, which would often be required on a secondary index.

The basic principle is:

  1. Reproduce the problem using production-levels of data on production-spec hardware in a non-production environment
  2. Diagnose what is causing it
  3. Make a change which you think may fix it
  4. Measure again, using the same production-spec non-production environment, to validate the performance of your fix.
  5. Repeat until you've got enough performance to solve the problem (to placate your customers etc)

And if successful, you can then do whatever your normal QA procedure (e.g. regression testing etc) is to release the change.

In some cases, a change will require a major data migration and hence be a big headache to deploy (say you need to change the schema of 10Tb of data tables).

MarkR