views:

116

answers:

4

A web app I'm working on requires frequent parsing of diverse web resources (HTML, XML, RSS, etc). Once downloaded, I need to cache these resources to minimize network load. The app requires a very straightforward cache policy: only re-download a cached resource when more than X minutes have passed since the access time.

Should I:

  1. Store both the access time (e.g. 6/29/09 at 10:50 am) and the resource itself in the database.
  2. Store the access time and a unique identifier in the database. The unique identifier is the filename of the resource, stored on the local disk.
  3. Use another approach or third party software solution.

Essentially, this question can be re-written as, "Which is better for storing moderate amounts of data - a database or flat files?"

Thanks for your help! :)

NB: The app is running on a VPS, so size restrictions on the database/flat files do not apply.

+1  A: 

Depends on the platform, IF you use .NET

The answer is 3, use Cache object, ideally suited for this in ASP.NET

You can set time and dependency expiration, this doc explains the cache object

http://articles.techrepublic.com.com/5100-10878_11-5034946.html

Stuart
I believe this is a language-agnostic question.
jimyi
@jimyi - so you down voted? I give the answer if asp.net was used, a technology must be chosen just not specified??
Stuart
Stuart, I've up-voted your response, so your comment is back to zero. Thanks for your help - any other thoughts are always appreciated! :)
@rinogo - thanks
Stuart
+1  A: 

To answer your question: "Which is better for storing moderate amounts of data - a database or flat files?"

The answer is (in my opinion) Flat Files. Flat files are easier to backup, and easier to remove.

However, you have extra information that isn't encapsulated in this question, mainly the fact that you will need to access this stored data to determine if a resource has gone stale.

Given this need, it makes more sense to store it in a database. Flat Files do not lend themselves well for random access, and search, compared to a relational DB.

Alan
RDBMSs also handle concurrent read/writes better.
BaroqueBobcat
+1  A: 

Neither.

Have a look at memcached to see if it works with your server/client platform. This is easier to set up and performs much better than filesystem/rdbms based caching, provided you can spare the RAM needed for the data being cached.

Lars Haugseth
Hi Lars,Great answer. Unfortunately, memcached doesn't really work for my current situation. RAM is a limited resource on the VPS - since I'm storing moderately sized objects and RAM is limited, I'd exhaust all available RAM pretty quickly. :/ (I've still voted your answer up, though - thanks for the help)
Tugela Cache (http://meta.wikimedia.org/wiki/Tugela_Cache) seems like a nice alternative, although it seems to be Abandonware (no longer being actively developed, forked from memcached at version 1.2).
A: 

All of the proposed solutions are reasonable. However, for my particular needs, I went with flat files. Oddly enough, though, I did so for reasons not mentioned in some of the other answers. It doesn't really matter to me that flat files are easier to backup and remove, and both DB and flat-file solutions allow for easy checking of whether or not the cached data has gone stale. I went with flat files first and foremost because, on my mid-sized one-box VPS LAMP architecture, I think it will be faster than a third-party cache or DB-based solution.

Thanks to all for your thoughts! :)