ansaurus

Question

fast thread ordering algorithm without atomic CAS

Answer 1

+1 A:

If I have this right you want to map Integer->Integer where the input in an arbitrary 32 bit number, and the output is a number from 0-N where N is the number of threads?

In that case, every time you create a new thread call this method, the returned value is the ID:

integer nextId = 0;
integer GetInteger()
{
    return AtomicIncrement(nextId);
}

This algorithm is obviously O(N)

Assuming several things:

threads never die
You have some kind of atomic increment which increments a number and returns the old value or the new value, the difference is between 0 based and 1 based IDs
Threads call a method asking what their ID is just once when they're created

Martin 2010-04-30 15:31:29

Where can I find an implementation of AtomicIncrement that doesn't rely on Atomic primitives? Atomic Compare-and-Set seems necessary to implement AtomicIncrement.

caffiend 2010-05-02 19:38:07

some platforms have an intrinsic AtomicIncrement operator, if not, then you need to build it with CAS and you're stuffed.

Martin 2010-05-02 20:28:10

Because this only runs once for each thread, you could just use a spinlock. In most cases it won't spin at all, in some cases it will take a while longer but even then not very long unless you're constantly creating hundreds of threads a second.

Martin 2010-05-02 20:29:35

Wouldn't a spinlock also require an atomic operation?

caffiend 2010-05-02 20:51:11

Normally they're implemented with a CAS, I suspect you can do it without one though. See my question here http://stackoverflow.com/questions/2754968/how-to-write-a-spinlock-without-using-cas

Martin 2010-05-02 21:02:59

Of course, a spinlock isn't necessarily required here. You could just use a normal lock. It's a bit slower but that's ok, the runtime of this method is spread across all the time you're starting new threads, and is a tiny overhead for each thread.

Martin 2010-05-03 19:11:29

i'm not sure what you mean by "normal lock"?

caffiend 2010-05-04 06:22:13

just a normal thread lock which puts the thread to sleep and is usually provided by the operating system. in C# it's the "lock" keyword, for example

Martin 2010-05-04 12:53:35

Answer 2

+3 A:

With no claim to being optimal in any sense (there are clearly faster ways to do this with atomic compare-and-set operations, or as Martin indicated, atomic increment)...

Assuming N is known to all the threads, and each thread has a unique non-zero ID value, such as its stack address in 32-bit space...

Use an array of size N in shared space; ensure that this array is initialized to zero.

Each thread owns the first slot in the array that holds an ID lower than or equal to the thread's ID; the thread writes its ID there. This continues until the array is full of non-zero values, and all the values are in decreasing order.

At the completion of the algorithm, the index of the thread's slot in the array is its ordinal number.

Doug Currie 2010-04-30 15:31:30

I'm not sure I completely understand this - making assumptions, it seems like you could have a scenario where all thread but the lowest one have slots. The low one comes in, overwrites the current lowest one. The 2nd-to-lowest now overwrites the 3rd, and this chain continues until all threads have moved by one?

caffiend 2010-05-02 19:43:41

@caffiend, yes, you could certainly get the behavior you describe; that's fine, though worst case kind of behavior. It's fine because the slots aren't final until the algorithm terminates; it's worst case since every thread has to "start over" when a new low ID thread enters the party. Thinking about the worst case, each thread would move its ID in the array at most N times, so it is a O(N-squared) algorithm [Note that the threads only ever need to look at slots at or beyond their present slot in the array, so they only traverse the array once].

Doug Currie 2010-05-02 20:15:52

O(N**2) is very scary with the potential number of threads - worse case unresponsiveness is important, and I think this algorithm needs to run until all threads have numbers.. Thinking about the array, it seems this ought to be solveable in O(log N) using sort techniques?

caffiend 2010-05-02 20:54:07

The O(log N) sort assumes that all IDs are known to one thread, but you said you couldn't "collect" all the threads. I wouldn't be too scared by O(N**2) since the constant factor is very small (a memory access).

Doug Currie 2010-05-03 01:44:10

The memory accesses seem to incur a large penalty when these threads are split across physical devices because only one device can 'own' a memory line. Would it make sense to use a hash-with-secondary-probing method instead of the linear probing?

caffiend 2010-05-03 07:08:40

A randomized/hashing version can work: a thread may (a) have at most one slot with its ID, and (b) write a slot if it contains zero. Each thread chooses a random slot in the array; if it is zero, it writes its ID there; if it is not zero it tries another slot. If the contents of that remains its ID, the thread idles. Otherwise it chooses another random slot and tries again. -- The problem with this algorithm is determining when it is safe to stop waiting; the threads must all wait for the array to fill, and since the slots are written randomly, that means reading all N slots, so O(N*N) again.

Doug Currie 2010-05-03 15:37:03

This seems to be the only viable approach - but the O(N*N) is bothersome and seems noticeable in my experiment. Is there a reason that this MUST be O(N*N), or is there a chance that something clever could make this better?

caffiend 2010-05-04 06:43:57

Lots of people smarter than me have worked on process synchronization without atomic increment or CAS. Their solutions take O(N) per sync even *after* each thread has an ordinal (see Dekker's, Peterson's, and Lamport's solutions on http://en.wikipedia.org/wiki/Mutex). Now with N threads each doing an O(N) sync, guess what, O(N*N).

Doug Currie 2010-05-04 18:34:29

I can see the reason in that, but is it required that the assignment of an ordinal require N x O(N)?

caffiend 2010-05-05 05:37:08

That's seems to be the best I can do without an atomic read-modify-write primitive.

Doug Currie 2010-05-06 17:48:19

ansaurus

tags:

views:

answers:

fast thread ordering algorithm without atomic CAS

related questions