views:

195

answers:

4

To set the background: I'm interested in:

  • Capturing implicit signals of interest in books as users browse around a site. The site is written in django (python) using mysql, memcached, ngnix, and apache

Let's say, for instance, my site sells books. As a user browses around my site I'd like to keep track of which books they've viewed, and how many times they've viewed them.

Not that I'd store the data this way, but ideally I could have on-the-fly access to a structure like:

{user_id : {book_id: number_of_views, book_id_2: number_of_views}}

I realize there are a few approaches here:

  • Some flat-file log
  • Writing an object to a database every time
  • Writing to an object in memcached

I don't really know the performance implications, but I'd rather not be writing to a database on every single page view, and the lag writing to a log and computing the structure later seems not quick enough to give good recommendations on-the-fly as you use the site, and the memcached appraoch seems fine, but there's a cost in keeping this obj in memory: you might lose it, and it never gets written somewhere 'permanent'.

What approach would you suggest? (doesn't have to be one of the above) Thanks!

+1  A: 

What approach would you suggest? (doesn't have to be one of the above) Thanks!

hmmmm ...this like been in a four walled room with only one door and saying i want to get out of room but not through the only door...

There was an article i was reading sometime back (can't get the link now) that says memcache can handle huge (facebook uses it) sets of data in memory with very little degradation in performance...my advice is you will need to explore more on memcache, i think it will do the trick.

gath
True, i could see using memcached - just wondering if it's the only way out (or the best way out?)
sotangochips
+1  A: 

Either a document datastore (mongo/couchdb), or a persistent key value store (tokyodb, memcachedb etc) may be explored.

No definite recommendations from me as the final solution depends on multiple factors - load, your willingness to learn/deploy a new technology, size of the data...

Chirayu Patel
A: 

Seems to me that one approach could be to use memcached to keep the counter, but have a cron running regularly to store the value from memcached to the db or disk. That way you'd get all the performance of memcached, but in the case of a crash you wouldn't lose more than a couple of minutes' data.

Daniel Roseman
+3  A: 

If this data is not an unimportant statistic that might or might not be available I'd suggest taking the simple approach and using a model. It will surely hit the database everytime.

Unless you are absolutely positively sure these queries are actually degrading overall experience there is no need to worry about it. Even if you optimize this one, there's a good chance other unexpected queries are wasting more CPU time. I assume you wouldn't be asking this question if you were testing all other queries. So why risk premature optimization on this one?

An advantage of the model approach would be having an API in place. When you have tested and decided to optimize you can keep this API and change the underlying model with something else (which will most probably be more complex than a model).

I'd definitely go with a model first and see how it performs. (and also how other parts of the project perform)

muhuk
+1: Just write the database and stop worrying. The I/O has to happen somewhere, so just do it. Trust the ORM caching and DB caching. Your other stuff (authenticating users, querying stuff for them) totally dominates performance. A log record won't be noticed.
S.Lott