views:

167

answers:

1

So in the meanwhile we know that double-checked-locking as is does not work in C++, at least not in a portable manner.

I just realised I have a fragile implementation in a lazy-quadtree that I use for a terrain ray tracer. So I tried to find a way to still use lazy initialization in a safe manner, as I wouldn't like to quadruple memory usage and re-order large parts of implemented algorithms.

This traversal is inspired by the pattern on page 12 of C++ and the Perils of Double-Checked Locking, but tries to do it cheaper:

(pseudo code!)

struct Foo {
    bool childCreated[4];
    Mutex mutex[4];
    Foo child[4];

    void traverse (...) {
        ...
        if (!childCreated[c]) { 
            // get updated view
            #pragma flush childCreated[c]
            if (!childCreated[c]) { 
                ScopedLock sl (mutex[c]);
                if (!childCreated[c]) {
                    create (c);
                    #pragma flush childCreated[c]  
                    childCreated[c] = true;
                }
            }
        }
    }
}

It is assumed that #pragma flush would also serve as a hard sequence point where compilers and processors won't be allowed to re-order operations across them.

Which problems do you see?

edit: Version 2, trying to take into account Vlads answer (introduce third flush):

(pseudo code!)

struct Foo {
    bool childCreated[4];
    Mutex mutex[4];
    Foo child[4];

    void traverse (...) {
        ...
        if (!childCreated[c]) { 
            // get updated view
            #pragma flush childCreated[c]
            if (!childCreated[c]) { 
                ScopedLock sl (mutex[c]);
                #pragma flush childCreated[c]
                if (!childCreated[c]) {
                    create (c);
                    #pragma flush childCreated[c]
                    childCreated[c] = true;
                }
            }
        }
    }
}

edit: Version 3, I somehow find this pretty equivalent to Version 2, because I am not using the child itself but a primitive flag to check for validity, basically relying on a memory barrier between creating a child and writing to that flag.

(pseudo code!)

struct Foo {
    bool childCreated[4];
    Mutex mutex[4];
    Foo child[4];

    void traverse (...) {
        ...
        if (!childCreated[c]) { 
            ScopedLock sl (mutex[c]);
            #pragma flush childCreated[c]
            if (!childCreated[c]) {
                create (c);
                #pragma flush childCreated[c]
                childCreated[c] = true;
            }
        }
    }
}
+3  A: 

It seems that your pattern is not correct. Consider the case when thread #1 executes till after the first #pragma flush. Then the control switches to the thread #2, which goes on and creates a c, the control is taken back just before second #pragma flush. Now the first thread wakes up, and creates the child anew.

Edit: sorry, wrong: it will be unable to take the lock.

Edit 2: no, still correct, because the value will be not flushed in thread #1

Vlad
I added another revision. Though I must admit the day has been long and I am slowly out of brain-flops.
phresnel
I think we still need a barrier between `create (c);` and `childCreated[c] = true;`, otherwise they may be reordered (and a thread #3 may start using `c` which is still not created).
Vlad
upon adding a third revision which is only a double-checked pattern again, I realized exactly that :)But when I see my third revision, it somehat looks too trivial to be correct. And drinking more coffee is not an option anymore, as I am already saturated on that :/
phresnel
Isn't your revision #3 basically the same as the code on page 12 of the PDF you referred to?
Vlad
I think there the barrier is between entering the function and the first check, but my first barrier is after the first check.
phresnel
The second `#pragma` must be indeed a barrier: it's not enough to just flush `childCreated[c]`, the value of `c` must be already created here from the point of view of the other threads.
Vlad
Aside of that, I can't see currently any problems.
Vlad
Still, I can see one: it's not a _triple_ -checked locking any more :-)
Vlad
It makes me still nervous that this code looks too trivial and that the authors of the pdf-file (and many other places where double-checks are discussed) don't just introduce another, primitive variable like I. Admittedly, there are also solutions with additional thread-private booleans, but it seems nothing gets as easy as my third code ... Now I only need to find out how a proper barrier looks in OpenMP (and not one where **all** threads must reach that barrier) :) Thank you very much for your help! (edit: on the other hand, I'll open up another question on that)
phresnel
About the memory barrier in OpenMP, may be this will be helpful: http://msdn.microsoft.com/en-us/library/sz9sd6et(VS.80).aspx
Vlad
Even though I am no Microsoft-Fan, I had this exact tab open for a long time yesterday, plus the original documentation at http://openmp.org/wp/openmp-specifications/ , and as always, Wikipedia with those two interesting articles: http://en.wikipedia.org/wiki/Memory_barrier , http://en.wikipedia.org/wiki/Memory_ordering#Compiler_memory_barrier
phresnel
Indeed, looking at the assembly, `#pragma omp flush` does the same as `__sync_synchronize();` as in that wiki article, and thus both also issue a compiler-level reordering barrier. I think this case is closed :)
phresnel
@phresnel: I am not a MS-fan, too, but I happen to work under Windows environment. And this link was among the first Google suggested. :)
Vlad