tags:

views:

38

answers:

2

I've developed a user rating system that takes analyzes a users and saves their information with a score in a db.

I'm getting close to 1 Million users rated and stored.

I'm having issues with taking a certain set of users from the table (score < 50) and then comparing their ids against another set of ids without the whole thing crashing down.

The result of the (score < 50) query is around 65k rows and the comparison is against probably 1,000 user ids so the whole thing is running 65k * 1,000.

Is my bottleneck at the db? Or is it at the comparison of ids? Is there a better way to split this up?

Query -> "select username, userscore from users where userscore < 50"

then

Foreach compares values

A: 

seems easy enough to answer if it is the db or not. Just prior to your query, do an fopen of a log file in /tmp. Then fwrite the results of a microtime() into the file. Just after your query, fwrite the results of another microtime(). Run your script once. you will be able to see the following:

1) are you even getting to the pre-query spot

2) Is the script failing in the middle of the query

3) how long does the query take if it doesn't crash the script

Zak
+2  A: 

Since you haven't provided any table or index information, here's what I'm going to suggest.

  • Make sure there's an index on userscore. If you have more than a million rows in your table and you're doing a query with "WHERE userscore > 50", that column needs an index.
  • Make sure your query is using that index. Run your query manually with EXPLAIN at the front, ie. EXPLAIN SELECT username, userscore from users where userscore < 50. Optimize the results.
  • You haven't mentioned how you're doing the id comparison, so I'll assume it's in a loop that checks each one against the array. You might be better off putting all 1000 ids into the query and limiting your SELECT query to users with score < 50 AND with their id in that set.

If you post more information about your tables, indexes, and comparisons, I can probably be more specific.

zombat
I set an index on userscore and my query is using that index.The problem with inserting the ids is that they could potentially go up to 1 million or more.
Tim
That's not a problem in and of itself. InnoDB tables can easily handle millions of rows, and as long as you are using an index, your SELECT statements should be fine.
zombat
I'm currently using ISAM. Would I be better to use InnoDB and then insert all id's into the mysql select statement?
Tim
ISAM, or MyISAM? MyISAM are actually faster for reads in most situations, but of course you can't do transactions with them. I guess I'm still unclear where your exact issue lies. If you run your query manually and get an exceptionally long result time, then your problem is in the query. If not, then it's probably in your comparison, and cutting down the result set will help.
zombat
Sorry friday distractions MyISAM. I tried your query building idea using the ids and it actually works great and even if it grows larger, it is still aiming for a very exact set of data (both the usernames and scores now have indexes on them)
Tim
That's a good way to do it then. Cutting out data makes everything faster, both database side and front-end side.
zombat