views:

45

answers:

3

I've got a question to which I've had opposing pieces of advice, would appreciate additional views.

My site has users, each with a user_id. These users can view products, and I need to keep track of the unique instances of users viewing specific products. To record a view in a separate views table, I've currently got two options:

OPTION 1:

view_id (INT,PK) | user_id (INT,FK) | product_id (INT,FK) | view_date

... and create a unique constraint over the two middle columns for easy updating with ON DUPLICATE KEY. If the same view already exists, I just update view_date. If not, I write a new row.

OPTION 2:

user_product (VARCHAR20,PK) | view_date

... merge the two ids into a VARCHAR with a separator in the middle, and use the primary key column for easy updating with ON DUPLICATE KEY in the same way as above.

The structure should accommodate up to approx. million unique views. Any thoughts on which option might be better or worse, and why? Big thanks in advance.

EDIT: Thanks for the answers, seems like there's a consensus. Was leaning to the same side but just needed the reassurance.

+2  A: 

I like the first option better - in general, its good to maintain as much atomicity as possible. If you ever want to query for all of a user's views, or something like that, it would be more difficult to do after merging two columns into one (you would need to use LIKE with a wildcard match, which will never be as fast as an indexed single-valued column). You also lose the ability to index on different fields.

Also, there is no reason why you couldnt have a primary or unique key that involved multiple columns, so I see no advantage to option 2. To perform your update, just use REPLACE (documentation) instead of INSERT - this will allow you to easily maintain your invariant of having only one row per user/product combination.

danben
Yep... although it's my understanding that REPLACE "spends" an auto-increment INT id (as it's delete then rewrite), so the PK column data type would need to accommodate that.
Tom
+1  A: 

I think that the first option is your better choice. Later down the line I think it will make querying for different things a bit easier. Queries will likely be faster as well since there won't be string manipulation involved. Further, you can have a primary key over multiple columns if you need.

Bryan Ward
+1  A: 

Definitely go for the first option. The second option will mean many queries from hell if you need to make reports to look for particular groups of users (get me all users that often view product X and product Y so we can offer them a discount), same for looking for specific groups of products (which products are often viewed by the same users, so we can launch a discount promotion)

I understand that it is not a requirement to remember all individual views. But I would certainly capture the number of times they visited the product - this is almost free, as you can keep a running total (insert 1 , on duplicate key update view_count = view_count + 1)

Roland Bouman
view_count... thanks, although I won't need it now, will probably add it, good piece of advice.
Tom