ansaurus

Question

Answer 1

+3 A:

First, check if there is an index on USER_ID and make sure it's used.

You can do it with running EXPLAIN.

Second, create a temporary table and use it in a JOIN:

CREATE TABLE temptable (user_id INT NOT NULL)

SELECT  *
FROM    temptable t
JOIN    customers c
ON      c.user_id = t.user_id

Third, how may rows does your query return?

If it returns almost all rows, then it just will be slow, since it will have to pump all these millions over the connection channel, to begin with.

NULL will not slow your query down, since the IN condition only satisfies non-NULL values which are indexed.

Update:

The index is used, the plan is fine except that it returns more than half a million rows.

Do you really need to put all these 638,000 rows into the report?

Hope its not printed: bad for rainforests, global warming and stuff.

Speaking seriously, you seem to need either aggregation or pagination on your query.

Quassnoi 2009-05-25 16:37:42

Thanks for replying. I'll run an EXPLAIN and post back here. The query returns ~638,000 rows at the moment. I'll try putting the user_ids in a temporary table if you think that'll be faster.

Jaymie 2009-05-25 16:48:59

EXPLAIN says:select_type = SIMPLEtable = customerstype = rangepossible_keys = userid_idxkey = userid_idxkey_len = 5ref = (NULL)rows = 637640Extra = Using wheredoes that help?

Jaymie 2009-05-25 16:53:56

EXPLAIN is OK, the index is being used. There are just lots or rows you don't seem to need. Aggregate or paginate them: no human being is able to browse over 638,000 rows.

Quassnoi 2009-05-25 16:55:34

This is true, but Crystal Reports can. Well, saying that.... ;o)

Jaymie 2009-05-25 17:07:09

Maybe it's better to use aggregation on database side and feed CR with aggregated values then? If you have 50 columns, then each row will take several kilobytes, and your report file will be several GB long. It will take minutes just to SAVE this file onto the HDD, to say nothing of processing.

Quassnoi 2009-05-25 17:10:07

Answer 2

A:

You can try to insert the ids you need to query on in a temp table and inner join both tables. I don't know if that would help.

Eric Hogue 2009-05-25 16:39:08

Answer 3

+1 A:

Are they the same ~560 id's every time? Or is it a different ~500 ids on different runs of the queries?

You could just insert your 560 UserIDs into a separate table (or even a temp table), stick an index on the that table and inner join it to you original table.

Eoin Campbell 2009-05-25 16:41:27

Thanks for replying. They're going to change each time. I really like the temp table idea.

Jaymie 2009-05-25 16:47:11

Answer 4

+1 A:

Is this your most important query? Is this a transactional table?

If so, try creating a clustered index on user_id. Your query might be slow because it still must make random disk reads to retrieve the columns (key lookups), even after finding the records that match (index seek on the user_Id index).

If you cannot change the clustered index, then you might want to consider an ETL process (simplest is a trigger that inserts into another table with the best indexing). This should yield faster results.

Also note that such large queries may take some time to parse, so help it out by putting the queried ids into a temp table if possibl

Jeff Meatball Yang 2009-05-25 16:41:29

Answer 5

+2 A:

"Select *" is not as bad as some people think; row-based databases will fetch the entire row if they fetch any of it, so in situations where you're not using a covering index, "SELECT *" is essentially no slower than "SELECT a,b,c" (NB: There is sometimes an exception when you have large BLOBs, but that is an edge-case).

First things first - does your database fit in RAM? If not, get more RAM. No, seriously. Now, suppose your database is too huge to reasonably fit into ram (Say, > 32Gb) , you should try to reduce the number of random I/Os as they are probably what's holding things up.

I'll assuming from here on that you're running proper server grade hardware with a RAID controller in RAID1 (or RAID10 etc) and at least two spindles. If you're not, go away and get that.

You could definitely consider using a clustered index. In MySQL InnoDB you can only cluster the primary key, which means that if something else is currently the primary key, you'll have to change it. Composite primary keys are ok, and if you're doing a lot of queries on one criterion (say user_id) it is a definite benefit to make it the first part of the primary key (you'll need to add something else to make it unique).

Alternatively, you might be able to make your query use a covering index, in which case you don't need user_id to be the primary key (in fact, it must not be). This will only happen if all of the columns you need are in an index which begins with user_id.

As far as query efficiency is concerned, WHERE user_id IN (big list of IDs) is almost certainly the most efficient way of doing it from SQL.

BUT my biggest tips are:

Have a goal in mind, work out what it is, and when you reach it, stop.
Don't take anybody's word for it - try it and see
Ensure that your performance test system is the same hardware spec as production
Ensure that your performance test system has the same data size and kind as production (same schema is not good enough!).
Use synthetic data if it is not possible to use production data (Copying production data may be logistically difficult (Remember your database is >32Gb) ; it may also violate security policies).
If your query is optimal (as it probably already is), try tuning the schema, then the database itself.

MarkR 2009-05-25 20:36:40

ansaurus

tags:

views:

answers:

faster way to use sets in MySQL

related questions