In our Ruby on Rails project we have a lot of categorization criteria for recipes, such as cook method, occasion etc. Every recipe belongs to one or several of these categories. When someone starts browsing for recipes, he/she can narrow down to a set of particular categories. Then we need to calculate the number of recipes in all categories accessible from this set ("accessible" means there are recipes in that category that also belong to selected categories). This is similar to how Amazon search works: someone enters 'Software' and there is a menu on the left which says "Books (200)", "Movies (300)" etc, so user can go deeper by clicking on these links.
Right now we've implemented it roughly like that:
- Build a set of selected categories from URL;
- Perform a query that fetches category ids from all recipes that fall into currently selected criteria;
- Build the index which maps all category ids to counts of recipes, and render only those that have non-zero counters;
- Store this index in memcached for 24 hours, so we only calculate it once per day for a particular page.
My concern is that if there is a cache miss, building index can take a lot of time. Maybe you have any suggestions how to solve this problem or improve current solution?