views:

213

answers:

5

This seems like perhaps a naive question, but I got into a discussion with a co-worker where I argued that there is no real need for a cache to be thread-safe/synchronized as I would assume that it does not matter who is putting in a value, as the value for a given key should be "constant" (in that it is coming from the same source ultimately). If the values can change readily, then the cache itself does not seem to be all the useful (in that if you care that the value is "currently correct" you should go to the original source).

The main reason I see to make at least the GET synchronized is that if it is very expensive to miss in the cache and you don't want multiple threads each going out to get a value to put back in the cache. Even then, you'd need something that actually blocks all consumers during a read-fetch-put cycle.

Anyhow, my working assumption is that a hash is by its very nature thread-safe because for any {key,value} combination, the value is either null or something that it doesn't matter who go there "first" to write.

Question is: Is this a reasonable assumption?

Update: The real scope of my question is around very simple id->value style caches (or {parameters}->{calculated value} where no matter who writes to the cache, the value will be the same and we are just trying to save from "re-calculating"/going back to the database. The actual graph of the object isn't relevant and the cache is generally long-lived.

+1  A: 

As long as the cost for acquiring and releasing a lock is less than the cost for recreating the object (from a file or database or whatever) all accesses to a cache should indeed be synchronized. If it’s not you don’t really need a cache at all. :)

Bombe
+2  A: 

For most implementations of a hash, you'd need to synchronize. What if the hash table needs to be expanded/rehashed? What if two threads are trying to add something to the hash table where the keys are different, but the hashes collide? They could both be modifying the same slot in the hash table in different ways at the same time. Assuming you're using a hash table to implement your cache (which you imply in your question) I suggest reading a little about the details of how hash tables are implemented if you're not already familiar with this.

dsimcha
Very good point... when re-generating the buckets it's going to use what it sees as the current set of keys/values and if two of the rehash() calls are fighting, the results could be interesting. Thanks!
jdewald
+2  A: 

Writes aren't always atomic. You must either use atomic data types or provide some synchronization (RCU, locks etc.). No shared data is thread-safe per se. Or make this go away by sticking to lock-free algorithms (that is, where possible and feasible).

Eduard - Gabriel Munteanu
+1  A: 

If you want to avoid data corruption, you must synchronize. This is especially true when the cache contains multiple tables that must be updated atomically. Imagine you have a database for a DMV (department of motor vehicles). You add a new person to the database, that person will have records for auto registrations plus records for tickets received for records for home address and perhaps other contact information. If you don't update these tables atomically -- in the database and in the cache -- then any client pulling data out of the cache may get inconsistent data.

Yes, any one piece of data may be constant, but databases very commonly hold data that -- if not updated together and atomically -- can cause database clients to get incorrect or incomplete or inconsistent results.

Eddie
I should have perhaps scoped my question more. I am referring to a cache cache of simple objects (id->value). If there is concern about the "dirtyness" of the object then cache I am considering (LRU long-lived cache which is not be getting updates from the source) wouldn't be appropriate.
jdewald
A: 

If you are using Java 5 or above you can use a ConcurrentHashMap. This supports multiple readers and writers in a threadsafe manner.

Fortyrunner
This is not sufficient if the cache contains multiple tables that need to be updated atomically.
Eddie