views:

183

answers:

5

I think i have a pretty good idea about the volatile keyword in java, but i'm thinking about re-factoring some code and i thought it would be a good idea to use it.

i have a class that is basically working as a DB Cache. it holds a bunch of objects that it has read from a database, serves requests for those objects, and then occasionally refreshes the database (based on a timeout). Heres the skeleton

public class Cache
{
    private HashMap mappings =....;
    private long last_update_time;
    private void loadMappingsFromDB()
    {
        //....
    }
    private void checkLoad()
    {
        if(System.currentTimeMillis() - last_update_time > TIMEOUT)
            loadMappingsFromDB();
    }
    public Data get(ID id)
    {
        checkLoad();
        //.. look it up
    }
}

So the concern is that loadMappingsFromDB could be a high latency operation and thats not acceptable, So initially i thought that i could spin up a thread on cache startup and then just have it sleep and then update the cache in the background. But then i would need to synchronize my class (or the map). and then i would just be trading an occasional big pause for making every cache access slower.

Then i thought why not use volatile

i could define the map reference as volatile

private volatile HashMap mappings =....;

and then in get (or anywhere else that uses the mappings variable) i would just make a local copy of the reference:

public Data get(ID id)
{
    HashMap local = mappings;
    //.. look it up using local
}

and then the background thread would just load into a temp table and then swap the references in the class

HashMap tmp;
//load tmp from DB
mappings = tmp;//swap variables forcing write barrier

Does this approach make sense? and is it actually thread-safe?

A: 

I think this general approach is valid (re-loading a cache on a background thread, not preventing access to the cache while the load is in progress by loading data to separate instances), but I'm not sure what declaring the reference as volatile really gives you.

You could just as easily have the get(id) method and the portion of loadMappingsFromDB() which overwrites the reference (not the entire load from DB, but just the re-assignment of mappings) synchronize on the same lock.

Honestly though I would look into re-using an established caching library like EhCache which has features for background loading or loading-on-startup, as these libraries have likely worked out all of it's synchronization issues a long time ago and you can go back to worrying about the logic of your app rather than the low-level safety of a home-brewed cache.

matt b
i don't want to use synchronized blocks because they incur a performance overhead.
luke
the overhead of a block like `synchronized (lockObject) { mappings = temp; }` is far less than you think. And using volatile has overhead as well.
matt b
volatile only has overhead immediately after a write, because caches have been invalidated (this really depends on the underlying architecture, but that is usually how the barrier works). while synchronized, imposes the same penalty plus a number of method calls and other cheks as well as potential blocking whereas volatile will always read a value immediately.
luke
A: 

I will be honest, I don't quite follow all your code but the volatile key word just lets your compiler know that the variable may change outside of your code.

Richard
Naah, not really. That may be the c/c++ definition. From jls: A field may be declared volatile, in which case the Java memory model (§17) ensures that all threads see a consistent value for the variable.
aioobe
You are right aioobe, I come from more of a c background, more so with interfacing with hardware. So if I am reading your definition correctly, would volatile in Java insure the children threads have access to the same parent variable?
Richard
In Java, `volatile` just means that a variable's value is checked against main memory for thread safety purposes. I'll update my answer with a link to the JLS.
Lord Torgamus
A: 

Actually, your proposed approach makes sense even without using 'volatile' keyword. Because reference assignment (mappings = tmp;) is an atomic operation (translates to one machine command) there's no chance of memory inconsistence:
http://java.sun.com/docs/books/tutorial/essential/concurrency/atomic.html
Reads and writes are atomic for reference variables and for most primitive variables (all types except long and double).

The idea behind 'volatile' keyword is that if you change such variable at time T, any other thread accessing it at time T + x (x > 0) will see new value. Otherwise, even after value is changed, there's still some time period during which other threads may see old value. But that doesn't seem to be a problem for you.
edit
Same idea is described in the link above.

Nikita Rybak
It would not be "safe publication" if `mappings` weren't volatile. The reference would not be updated immediately, and the values could be stale when it was.
gustafc
Atomic operations are _not quite guaranteed_ to be thread-safe: http://www.ibm.com/developerworks/java/library/it/it-1001art24/index.html#3
Lord Torgamus
@gustafc If I understand poster correctly, few seconds of delay (which is unlikely already) won't be a problem here. It's not like value will be 'stale' for a year. But I agree that it's not desirable in many conditions. I just described the behaviour and it's for author to decide whether use it or not.
Nikita Rybak
@Nikita, that's not the kind of staleness I mean. The client could see an incompletely constructed `HashMap` if the reference wasn't volatile.
gustafc
@Lord Torgamus Nobody's talking about full thread-safety. I just point out that caveats described in your link aren't a threat here (with one assignment every couple of minutes). Having 'happens-before relationship' doesn't seem to be critical here.
Nikita Rybak
@gustafc Sorry, but that sounds like sci-fi to me. If object is fully constructed/populated in one thread and then passed to another, how can second thread see only part of it? I'll be very interested in seeing any links confirming it. You'll really broaden my horizon in this case :)
Nikita Rybak
Point taken, but the language you used was pretty... expansive.
Lord Torgamus
@Lord Torgamus Not sure I understand what 'expansive' means here. If it was harsh or 'offensive', then I'm sorry, it wasn't meant to be :)
Nikita Rybak
@Nikita, explanations are sparse, but my understanding is that different memory pages may be flushed at different times, so a reference pointing to an object on another page could point at a page which hasn't been flushed. http://www.ibm.com/developerworks/java/library/j-jtp06197.html#2.2
gustafc
@gustafc Thanks, the link is very interesting. I thought it's logical to assume that reference is 'refreshed' completely, but apparently it's not.
Nikita Rybak
@Nikita, no, you weren't offensive at all. It's just that you didn't mention full thread safety until you responded to my comment, and I thought someone reading your answer later might get the wrong idea.
Lord Torgamus
@Nikita, it *does* seem logical that everything is published at once, but the only safe assumption when it comes to concurrency is that there are no safe assumptions :)
gustafc
+1  A: 

Does this approach make sense? and is it actually thread-safe?

It does make sense, and it is thread safe. To some extent, anyway. Some things to think about:

  • When updating, you let the application read old, out-dated values. Is this what you intended? It's fine for some applications, in other cases you might want to block until the cache has been updated (FutureTask makes this behavior rather easy).
  • When loadMappingsFromDB() kicks in, the thread originally calling get(ID) will block until the update is finished.
  • Several threads may call checkLoad() concurrently, which means that if the reload is slow and you have several threads calling get(ID), you may end up with craploads of concurrent updates. While the result would be the same, it would be a waste of system resources. An easy way to fix it in your current code would be having an AtomicBoolean which you check before updating:

    private final AtomicBoolean isUpdating = new AtomicBoolean(false);
    private void checkLoad()
    {
        if (System.currentTimeMillis() - last_update_time <= TIMEOUT) return;
        if (!isUpdating.compareAndSet(false, true)) return; // already updating
        try {
            loadMappingsFromDB();
        } finally {
            isUpdating.set(false);
        }
    }
    
gustafc
@gustfc these are all good points in general, but they don't really matter for my case. only one thread will be calling get so only that thread would ever call checkUpdate at a time, the question was more about the safety of modifying this shared variable. Also its ok if there are 'stale' reads because we are just polling for changes so the data might have been stale for a while before we attempted an update (and that ok for my use case). But thanks for the code snippet, i haven't used AtomicBoolean before, so +1 :)
luke
+2  A: 

There's some misinformation in the existing answers to this question. Use of volatile actually is a good step in ensuring thread safety. See item 3 in Dispelling Java programming language myths by IBM's Peter Haggar. Haggar gives a bit of background and an example, but the nut is this:

So, how can atomic operations not be thread safe? The main point is that they may indeed be thread safe, but there is no guarantee that they are. Java threads are allowed to keep private copies of variables separate from main memory.

By using volatile, you will guarantee that threads are referring to main memory and not using private copies of variables that you don't know about or expect.

To answer your question, then: yes, your strategy is safe.

EDIT:
In response to another post, here is the JLS section about volatile fields.

Lord Torgamus
thats a nice article, thanks
luke