views:

31

answers:

2

I'm sure there's an easy answer here and i'm just over-complicating things, but i am in the process of writing a Django module responsible for storing and tracking user usage statistics which can then be displayed to users on certain pages.

Simplest example of this might be tracking which products a user has purchased and looked at so that when someone else is looking at a product we can display "products other people bought that also bought this"... etc

My uncertainty is to do with the best data model approach. I am thinking that it might not be efficient to write to a table every time someone looks at a product, but then i do need to display this data in some form to users. I'm looking for a strategy that is easy to manage and that is efficient from the bottom up. Any suggestions?

EDIT: By 'efficient from the bottom up' - i'm basically just talking about adopting a solution that is efficient to both store and retrieve - probably should have just said that :)

Also, to add another complication, let's say i want to track viewing relationships between products rather than simply logging that an individual product was viewed. So, for example i might want to show on product A's page that people who viewed product A also viewed product b c and d. On the back of some of the comments below i was thinking about creating a table / django model with 2 simple fields (product_name and last_viewed_date), that way i can run a job to consolidate all views of a single product into one row (with the last_viewed_date taking the date of the most recent record)... BUT if i also want to store the history of each of those views, as explained above, how might i do that?

A: 

A database table design would be the best thing to do. That way you can recommend similar items that people browse.

Sometimes people might log into the site, sometimes just browse so you would need a table by session-id and another by user-id to track the items browsed.

Also you would want categorization of items and only display related items. That way if one customer looked at gardening tools and iPhones you wouldn't suggest to the next person looking for an iPhone to buy a Rake.

Answer edited for comments

You can retrieve data from a database much faster that you can do with a log file. With the help of proper indexing and good normalization / denormalization of your structures, you will be able to do this for a huge number of rows. A database scales much better than text files.

If you create an INSERT style model rather than an update model, you will deal with less contention too. You will have to build up an archiving mechanism to ensure that the table does not grow wildly if you are restricted by space.

I would personally go with an RDBMS rather than a text file any given day for this scenario.

Raj More
But let's say that a typical user might look at 5 products and we want to track each of those 5 views. Multiply that by thousands of users and that is a lot of writing the database. Do you think there is potentially a more efficient way to store this data? Perhaps to a log file first and then have that collated into a db table once a day?
Benjamin Dell
@Benjamin Dell - answer edited for comments
Raj More
Yep, maybe you're right. I'm wondering how I might be able to archive data at a later date. Perhaps by creating a 'qty' field i could simply merge similar records together at the end of each day (i.e. if i have 10 records showing that product A was looked at, then those could be consolidated into one row with the date set to the latest one and the qty field being updated to reflect the total number that have also looked at that product. Does that sound sensible?PSThanks also to Hank for offering your suggestion for a view decorator.
Benjamin Dell
A: 

An RDBMS can handle a lot more writes than most people seem to realize. Unless you are replacing an existing implementation that already handles massive amounts of traffic, it's better to code for clarity now and change the implementation later if it can't keep up. Ideally, you'd do this in some cleanly designed way (a decorator on the view functions you want to track, perhaps?) so that later you can swap in a different implementation without affecting all of your code.

Hank Gay