views:

23

answers:

2

Let's say I have a table named products and I want to know how many times the products were searched, viewed and purchased. I want also to know when the products were searched, viewed and purchased.

My first approach was to make a table with the product_id, a field indicating if the item was 0=searched, 1=viewed and 2=purchased and another field keeping the ´datetime´ of the event, so I can filter by time.

This works pretty well, but is NOT scalable. If I have 50,000 products in the database and 1,000 users making 5 searches each one every day, then I have 50,000 * 1,000 * 5 = 250'000,000 new records per day, so this not looks like the perfect solution to me.

I have a few ideas about how to enhance this but I'd really like to read about better ways, since I'm not happy with mine.

A: 

Keep storing this data (storage is cheap and relatively scalable, if you don't have to access it).

Aggregate what is interesting for you.

Once you know which statistics are interesting for you, you can generate these incrementally using aggregates of the minimum time span of interest. To take a simple example: if you are interested in the total sales count for an item, but only on a yearly basis, you can aggregate "sales in 2010", "sales in 2009". Work with these aggregates whenever possible.

Still, using the original data, you can generate new aggregates if you discover that another metric becomes interesting.

relet
A: 

*If I have 50,000 products in the database and 1,000 users making 5 searches each one every day, then I have 50,000 * 1,000 * 5 = 250'000,000 new records per day, so this not looks like the perfect solution to me.*

This calculation seems incorrect to me. Why would you want to include 50000 reords for each user each day.? Even if we take the case of views/products/users, you would have one master table for all products and when the user actually views a product, you'd have one entry with the following details.

create table product_views
(
product_id number,
user_id varchar2(50),
view_time date);

The columns product_id, user_id would refer to the parent tables product and users respectively which will have detailed descriptions of the same.

So, in the scenario you provided, there would be 500 searches (1000 users and 5 searches each) and there would be 5000 inserts into this table.

Rajesh
But this table will tell me about the searches and not about the products.
Erik Escobedo
Yes, that is what you are interested in capturing. Once you have these details, you can always join it with the product table to get the products on which the person was searching.
Rajesh