views:

288

answers:

6

What is the most efficient solution when you need to record some data on every page view in your application - should you write to a file or write to the database?

Or maybe neither - perhaps you should cache the data in memory or a file and only write it to the database (or file system if you use a memory cache) occasionally?

+1  A: 

That depends.

Ands it really does: it depends on the DBMS and/or the OS+filesystem you use. In other words: your mileage varies.

If you just append data somewhere modern DBMS/OS+filesystems should handle this equally well and fast. Problems arise when you want to change data.

Caching - depends too on what kind of caching granularity you can afford (need to have every stepped logged crash-safe versus potential saving).

Leonidas
+2  A: 

That highly depends on your needs for data safety. If you can afford to lose some data in case of a crash then keeping the data in memory and writing it periodically to a persistent store is certainly the most efficient way to go.

Edit: You mentioned pageviews. In that case I would keep the counters in memory and periodically update a database table (like every minute or so).

lothar
This is just for tracking page views actually. It's not critical data like order information.
+7  A: 

If it's purely recording a small amount of data with no subsequent lookups, straight file I/O is almost guaranteed to be more efficient. You're losing all the advantages of a DBMS though -- indexing, transactional integrity (really, ACID in general), concurrent access, etc..

It almost sounds like you're talking about what amounts to simple logging. If that's the case, and you don't need to do frequent complex queries on the resulting data, you're probably better off with straight file I/O if performance is a serious issue. Be careful of concurrent-write issues, though.

If the properties of an RDBMS are desirable, you might think about using SQLite, which for simplistic loads will get you better performance than most RDBMSs with less overhead, at the cost of some of the benefits (highly concurrent access and availability over the network to other machines are a couple of the "biggies"). It still wouldn't be as fast as straight file I/O in the general case, though.

Your later mention of it being for page view tracking causes me to ask: Are you incrementing a counter, rather than logging data about the page view? If so, I'd strongly suggest going with something like SQLite (doing something like UPDATE tbl SET counter = counter+1). You really don't want to get into the timing issues involved in doing this by hand -- if you don't do it right, you'll start losing counts on simultaneous access (A reads "100", B reads "100", A writes "101", B writes "101"; B should have written 102, but has no way of knowing that).

Nicholas Knight
+3  A: 

Hitting the database is most likely going to be more expensive than writing to a file.

If your pageviews per second are high, and if the data doesn't need to be available in the database right away, then writing to a file and periodically loading the data into the DB will be a more optimal solution.

However it all depends on the nature of the data you're recording per page view and how critical it is to whatever business function it serves.

Kev
+3  A: 

Conceptually, writing to the database is always slower than writing to a file. The database has to write to a file too, with the extra overhead of communication to get the data to the database, so it can write it to a file. Therefore, it must be slower.

That said, databases do disk I/O very well, probably better than you will. Don't be surprised if you find out that a simple file logger is slower than writing it to a database. The database has a lot of I/O optimizations, and has some tricks available that you may not (depending on your web lanaguage and environment).

Don't be surprised if the answer changes over time. When your site is small, logging to a database is very fast. As your site grows, the logging table can become a major pain: It uses a lot of disk space, makes the backups take forever, and consumes all the disk I/O when you try to query it. This is why you should benchmark both methods yourself. Then you can re-test in the future, when conditions change.

Craig Lewis
A: 

Use a hybrid solution like redis its designed for this sort of stuff

Sam Saffron