views:

292

answers:

7

Modern database provide caching support. Most of the ORM frameworks cache retrieved data too. Why this duplication is necessary?

+5  A: 

Here are a couple of reasons why you may want this:

  • An application caches just what it needs so you should get a better cache hit ratio
  • Accessing a local cache will probably be a couple of orders of magnitude faster than accessing the database due to network latency - even with a fast network
Robert Christie
+15  A: 

Because to get the data from the database's cache, you still have to:

  1. Generate the SQL from the ORM's "native" query format
  2. Do a network round-trip to the database server
  3. Parse the SQL
  4. Fetch the data from the cache
  5. Serialise the data to the database's over-the-wire format
  6. Deserialize the data into the database client library's format
  7. Convert the database client librarie's format into language-level objects (i.e. a collection of whatevers)

By caching at the application level, you don't have to do any of that. Typically, it's a simple lookup of an in-memory hashtable. Sometimes (if caching with memcache) there's still a network round-trip, but all of the other stuff no longer happens.

Dean Harding
Not to mention slow network links, etc.
Robert Wilson
+3  A: 

Even if a database engine caches data, indexes, or query result sets, it still takes a round-trip to the database for your application to benefit from that cache.

An ORM framework runs in the same space as your application. So there's no round-trip. It's just a memory access, which is generally a lot faster.

The framework can also decide to keep data in cache as long as it needs it. The database may decide to expire cached data at unpredictable times, when other concurrent clients make requests that utilize the cache.

Your application-side ORM framework may also cache data in a form that the database can't return. E.g. in the form of a collection of java objects instead of a stream of raw data. If you rely on database caching, your ORM has to repeat that transformation into objects, which adds to overhead and decreases the benefit of the cache.

Bill Karwin
+1  A: 

Also, the database's cache might not be as practical as one think. I copied this from http://highscalability.com/bunch-great-strategies-using-memcached-and-mysql-better-together -- it's MySQL specific, tho.

Given that MySQL has a cache, why is memcached needed at all?

The MySQL cache is associated with just one instance. This limits the cache to the maximum address of one server. If your system is larger than the memory for one server then using the MySQL cache won't work. And if the same object is read from another instance its not cached.

The query cache invalidates on writes. You build up all that cache and it goes away when someone writes to it. Your cache may not be much of a cache at all depending on usage patterns.

The query cache is row based. Memcached can cache any type of data you want and it isn't limited to caching database rows. Memcached can cache complex complex objects that are directly usable without a join.

monotux
+1  A: 

The performance considerations related to the network roundtrips have correctly been pointed out.

To that, it must be added that caching data anywhere else than in the dbms (NOT "database"), creates a problem of potentially obsoleted data that is still being presented as being "up to date".

Giving in to the temptations of performance improvement goes at the expense of losing the guarantee (watertight or at least close to that) of absolutely reliably and guaranteeably correct and consistent data.

Consider this every time accuracy and consistency is crucial.

Erwin Smout
+1  A: 

No doubt that modern databases are providing caching facility but when you are having more traffic on you site and that time you need to perform many database transaction then you will no get high performance.So to increase performance in this case hibernate cache will help you, by optimizing the database applications. The cache actually stores the data already loaded from the database, so that the traffic between our application and the database will be reduced when the application want to access that data again.The access time and traffic will be reduced between the application and the database.

Rupeshit
+1  A: 

A lot of good answers here. I'll add one other point: I know my access pattern, the database doesn't.

Depending on what I'm doing, I know that if the data ends up stale, that's not really a problem. The DB doesn't, and would have to reload the cache with the new data.

I know that I'll come back to a piece of data a few times over the next while, so it's important to keep around. The DB has to guess at what to keep in the cache, it's doesn't have the information I do. So if I fetch it from the DB over and over, it may not be in cache if the server is busy. I could get a cache miss. With my cache, I can be sure I get a hit. This is especially true on data that is non-trivial to get (i.e. a few joins, some group functions) as opposed to just a single row. Getting a row with the primary key of 7 is easy for the DB, but if it has to do some real work, the cost of the cache miss is much higher.

MBCook