ansaurus

Question

Is using IN (...) the most efficient way to randomly access a MySQL table?

Answer 1

A:

Hell yes you should add an index. But if the id is a "primary key", then it already is an index.

GoatRider 2009-05-07 14:35:47

Answer 2

+2 A:

Add an index on the ID column and (optionally) define it as UNIQUE. This will help MySQL to quickly locate the rows you want, because the index contains the ID in sorted order. Even if your table were sorted, too, e. g. because you insert in increasing ID order, MySQL does not know that and will always do a full table scan to find the matching records for your queries.

With the index on the other hand, the search becomes very easy for the server. Only if you ask for really, really many rows at once (very long IN() clause), the optimizer might decide that you want more than about 30% of the data - in which case it will fall back to a linear scan again to prevent excessive disk seeking.

However with several million rows this would be hell of a long condition :)

I'd also recommend reconsidering if the column really has to be 255 characters long - even though VARCHAR will not use that much space when you don't need to, it sound like a questionable design. Whether it should be a numeric field or not may depend on your needs, however it is usually recommended.

Daniel Schneller 2009-05-07 14:50:53

The 'id' column is a alpha-numeric string of variable length, however it gets nowhere near 255 characters long.

rjstelling 2009-05-07 15:06:56

Answer 3

+3 A:

Since you have a file of the ids you want, I recommend importing it into a work table and then joining the table to your production table to get the results you want. Of course , before you do anything, you need to implement an index strategy.

HLGEM 2009-05-07 15:08:06

Answer 4

A:

From what I've been led to understand by our DBA in the past, the "IN" clause has a limit on how many explicit IDs can be specified within the brackets. I was informed this doesn't apply if you can use a SELECT to feed the IN list.

HLGEM's suggestion of importing and using a join to the table is probably the easiest approach. And, as mentioned before, indexing will improve the performance.

Jay S 2009-05-07 16:26:00

Answer 5

A:

Yes, add an index over both tables (2.4mil and 10,000).

Assuming transaction_table is 10,000 rows, archive_table is 2.4mil rows and you've built an index overr archive_table you could code:

SELECT id
  FROM transaction_table a
 WHERE EXISTS( SELECT *
                 FROM archive_table b
                WHERE a.id = b.id )

Using an EXISTS clause over a JOIN is more readable and has the same performance as the join.

Paul Morgan 2009-05-07 20:20:58

ansaurus

tags:

views:

answers:

Is using IN (...) the most efficient way to randomly access a MySQL table?

related questions