I have a list of products {P1, P2, ...} each of which can have a list of attributes {a1, a2, ...}. What's the fastest algorithm to find all elements NOT having some attributes say {a2, a6, a10}?
If P1 = {a1, a2, a3} P2 = {a3} P3 = {a1, a4} , the algorithm should return {P2, P3}
The issue is I don't know the input list of attributes is as this is passed in by the user. The list of products and their associated attributes are stored in the database:
Product Table (Have more than 10000 rows)
ProductID int,
ProductName varchar
Attribute Table (Have about 400 rows, might grow in the future)
AttributeID int,
AttributeName varchar
Product_Attribute_Association Table
ProductID int,
AttributeID int
The query I have is:
SELECT p.ProductID, p.ProductName
FROM Product p
WHERE p.ProductID NOT IN
(SELECT pa.ProductID FROM Product_Attribute_Association pa
WHERE pa.AttributeID NOT IN (1, 4, 5) -- What ever being passed in
) t
This service will get hit pretty hard and I am thinking about caching the data of the 3 tables in memory in some data structure and write an efficient algorithm to do the lookup. Can you please suggest something I should look into? Thanks
EDIT: Updating the database is not an issue. The cache will be rebuilt from the database every hour, so the time that builds the cache is less of a concern.
Memory is not a concern either.