views:

136

answers:

1

I have a MySQL database with a fairly large table where the products are. Each of them has its own id and categoryId field where there is a category id belongs to this product. Now I have a query that pulls out products from given categories such as:

SELECT * FROM products WHERE categoryId IN ( 1, 2, 3, 4, 5, 34, 6, 7, 8, 9, 10, 11, 12 )

Of course, come a WHERE clause and ORDER BY sort but not in this thing. Let's say that these products is 250k and the visits are over 100k per day. Under such conditions in the table slow_log registered weight of these queries with large generation time.

Do you have any ideas how to optimize the given problem?

Table engine is MyISAM.

+2  A: 

Index on categoryId won't help in this case, IN (...) queries will produce sequence scan instead of index lookup anyway.

I would consider first redesigning the system to get rid of multiple category select and if it is not appropriate, caching query results.

For example, you can create a helper table items_category_groups(hash, item_id) and after client query on multiple categories hash their combined ids and lookup this table. If not found, make an expensive query and fill this table. If found, make a cheap query joining these tables. Other caching tools like memcached will work too.

Andrey
good idea but this query is used to pulls out product for given category and its subcategories (whole branch of this category) so it is imposible
Arkadiusz Kondas
Then caching is the only way I see, although the cache size will be large (250k * count of category combinations), so I would store them in DB, not in memcache or something.
Andrey
If you just want to pull out products from ONE category (and its subcategories), you can create a helper table (category_id, item_id) and fill it with all category_id-item_id pairs including subcategories. This table won't be very large (say, if you have category nesing level of 4, it will contain not more than 1m*number of categories two-int rows, it's not a big deal). Then you will fetch products with a single fast query using indexes.
Andrey
but this is the same. I will also must use IN on the helper table
Arkadiusz Kondas
No.For example, you have category 1 with subcategory 2 and 2 has a subcategory 3.Item 1 belongs to category 3.Then you will have 3 rows for one item in this table:1 1 |1 2 |1 3And if you want to select all items from, say, category 2, you write, roughly:SELECT items.* FROM items i, item_categories ic WHERE i.id=ic.item_id AND ic.category_id=2;And it will be equal toSELECT * FROM items WHERE category_id IN (2, 3);Of course, if you want to select items from two not related categories, this won't work, but in case of nested categories it will work just fine.
Andrey
thx for anserw and example i will try and check results
Arkadiusz Kondas
after my reflection: when i want add product ,for example, to category 3 i must put in item_categories 3 rows ? 1 1 | 1 2 | 1 3 ?If yes the save proccess will take longer and i must generate tree for each save.
Arkadiusz Kondas
sure, but this is just one time on save, and even these 3 queries may work faster than one select with IN(). If you have much more selects than inserts, I guess this is worth it.
Andrey