views:

31

answers:

2

Say you have a 4-node J2EE application server cluster, all running instances of a Hibernate application. How does caching work in this situation? Does it do any good at all? Should it simply be turned off?

It seems to me that data on one particular node would become stale quickly as other users hitting other nodes make changes to database data. In such a situation, how could Hibernate ever trust that it's cache is up to date?

+1  A: 

First of all there are 2 caches in hibernate. There is the first level cache, which you cannot remove, and is called hibernate session. Then there is the second level cache which is optional and plugable (e.g ehcache). It works accross many requests and, most probably, it's the cache you are refering to.

If you work on a clustered environment then you need a 2nd level cache which can replicate changes accross the members of the cluster. Ehcache can do that. Caching is a hard topic and you need a deep understanding in order to use it without introducing other problems. Caching in a clustered environment is slightly more difficult.

cherouvim
So if you're forced to use a clustered environment, you're effectively "forced" to configure and use Ehcache so Hibernate's caching works? (And what happens if you don't?)
Crusader
If you don't use cache then everything will work as expected but you will not have caching. Nobody forces you to use cache and nobody forces you to use replicated cache. Your application architecture may be completelly stateless and your caches may only relate to data-per-user. In that case a sticky session setup will do the job.
cherouvim
I'm not interested in using caching, but I am interested if Hibernate has some caching functionality which may send users back 'stale/incorrect' cached data. It sounds like this is not an issue because in between each user's request, because a new session (new 1st level cache) is used, correct?
Crusader
You'll not get any stale data since the first level cache is bound to the (db) transaction. It starts when the request starts and it ends when the request ends. If you reload the same row many times within that unit of work the first level cache will work and only 1 query will hit the database. This is good and not bad. But also you can override this setting and hit the database if this is what you want (but I don't think that you should care about this).
cherouvim
+2  A: 

First of all, you should clarify what cache you're talking about, Hibernate has 3 of them (the first-level cache aka session cache, the second-level cache aka global cache and the query cache that relies on the second-level cache). I guess the question is about the second-level cache so this is what I'm going to cover.

How does caching work in this situation?

If you want to cache read only data, there is no particular problem. If you want to cache read/write data, you need a cluster-safe cache implementation (via invalidation or replication).

Does it do any good at all?

It depends on a lot of things: the cache implementation, the frequency of updates, the granularity of cache regions, etc.

Should it simply be turned off?

Second-level caching is actually disabled by default. Turn it on if you want to use it.

It seems to me that data on one particular node would become stale quickly as other users hitting other nodes make changes to database data.

Which is why you need a cluster-safe cache implementation.

In such a situation, how could Hibernate ever trust that it's cache is up to date?

Simple: Hibernate trust the cache implementation which has to has to offer a mechanism to guarantee that the cache of a given node is not out of date. The most common mechanism is synchronous invalidation: when an entity is updated, the updated cache sends a notification to the other members of the cluster telling them that the entity has been modified. Upon receipt of this message, the other nodes will remove this data from their local cache, if it is stored there.

Pascal Thivent
I don't need or want to use any special caching. I'm more concerned with the possible introduction of bugs (from stale data) as a result of **any** level of (default) caching that Hibernate may do. I don't want to implement caching, just determine whether anything special needs to be done within a *clustered* environment to prevent use of stale data.
Crusader
@Crusader The first-level cache is typically per-request, you won't get any stale data. However, you'll have to deal with concurrency, using optimistic or pessimistic locking. But this is not specific to clustered environments.
Pascal Thivent