ansaurus

Question

Large array and multiple threads - spinlocks or local buffers or something else?

Answer 1

+1 A:

It seems like the simplest solution can fit here. The simplest solution is to lock the instance the thread currently manipulates.

I base it on a simple execution with and without locks.

This run, that locks the instance, takes ~10.09 seconds. The same run, without lock takes ~9.03 seconds:

const int numOfPersons = 250000;
var persons = new List<Person>(numOfPersons);
for (int i = 0; i < numOfPersons; i++)
{
    persons.Add(new Person());
}

var rand = new Random();

var sw = Stopwatch.StartNew();

for (int j = 0; j < 100; j++)
{
    for (int i = 0; i < 100000; i++)
    {
        int index = rand.Next(0, numOfPersons);

        Person person = persons[index];
        lock (person)
        {
            person.Name += "a";
        }
    }
}

Console.WriteLine(sw.Elapsed);

Since the ratio of elements to threads is big enough, the expectation of time each thread needs to wait for an instance is negligible.

As can be seen from the example, the time overhead for locking the instances is ~1 second. This code does 100 times 100,000 modifications in a collection of size of 250,000 items. The 1 second time is roughly constant no matter what the modifications are.

Elisha 2010-07-31 13:17:06

This will only work if the array the OP mentions contains reference types.

chibacity 2010-07-31 13:21:24

@chibacity, you're right. In case the elements are value types the lock will be meaningless.

Elisha 2010-07-31 13:25:27

@Elisha thx for Your answer. The elements actually are value types. But it would be no problem to extend the array to hold an additional lockable object for each entity.

Dave 2010-07-31 13:37:48

@Elisha which part of Your code did You time? 6 seconds seem way to much to process 250k elements. Maybe Your code is modification bound and not locking bound. What accounts now for 0.45% could be much more in my scenario. As I said a modification takes ~10^-6 seconds so iterating over 250k elements would take ~0.25 seconds - on a single core machine.

Dave 2010-07-31 13:45:01

@Dave The example contains string concatenation which probably accounts for the difference. I'm guessing you are merely setting a value which would have much lower overhead.

chibacity 2010-07-31 14:05:29

@Dave, I updated the example to measure only the time the modifications take. In addition, it should be noted that the locking time is roughly constant, the usage I did before with percentage is misleading.

Elisha 2010-07-31 14:05:58

@Elisha thx for the update. The code is executed from a single thread, right? Do You know how performance of locking an already locked object (by an other thread), differs from the current performance? Don't rush to update Your example!, I will test this by myself - You already helped me much by pointing out that this simple solution could be feasible. A mere guess would be fine at this point of time.

Dave 2010-07-31 14:15:21

@Dave, the main issue with locking will be that when an item is locked other threads trying to update it will wait and do nothing until the update is finished by the other thread. In the case where you have 1 thread to ~60,000 items collisions are rare. I tried a version of the example in 4 threads (each thread updated 25,000 items) - the overhead of the version with locks was ~1.1 seconds.

Elisha 2010-07-31 14:21:09

@Elisha Just to clarify, threads do not wait when there is monitor lock contention, they get put to sleep i.e. they are de-scheduled which adds a large amount of overhead. Waiting is more accurate to use as a description for spin locks.

chibacity 2010-07-31 14:37:46

@chibacity, you're right again :) Waiting is not the correct description. It does put the thread to sleep. Even though, since collisions are not very likely to happen here a lot, the expectation of this overhead is tiny.

Elisha 2010-07-31 14:49:11

Answer 2

A:

It seems that your entities are structures (of 20 bytes each). This is a wild guess, because I have no idea what you're actually trying to do, but can't you make those entities immutable reference types?

When you make immutable reference types, your array will only consist of references, which will be 4 bytes (or 8 bytes on 64 bits) in size and changing a reference will always be an atomic operation (unless you explicitly change alignment of course). Changing an entity means creating a new one and replacing the reference in the array from the old to the new. This way changes are atomic. However, you can still loose changes when two threads write to the same slot shortly after each other (but you don't seem to worry about that, because you are talking about 'picking a winner').

I have no idea what this will do to the performance, because your might explicitly choose for a value type array instead of a reference type array. However, sometimes it is good to make the solution simpler, not harder. This solution might also improve cache locality, because you are talking about random access to a big array. This array will therefore not fit into the CPU’s cache and you will have a lot of cache misses.

Steven 2010-07-31 13:22:55

"However, you can still loose changes when two threads write to the same slot shortly after each other" - I do worry about that! Picking a winner is possible if I have ALL copies. But in the scenario you describe, one copy gets lost forever and nobody will ever notice.

Dave 2010-07-31 13:40:31

You can also use a concurrent queue and add every change to it, or use a queue per thread. This way you won't loose any changes.

Steven 2010-07-31 14:13:06

Answer 3

+1 A:

As they say, there's more than one way to skin a cat (though why anybody wants skinned cat is another question) :-)

With 250K objects and 4 threads, you'd have to guess that conflicts will be (relatively) rare. That doesn't mean that we can ignore them, but it may affect how we look for them. Testing a critical section is very fast, unless there is actually a conflict. That means that it might be feasible to check a critical section for every transaction, in the knowledge that relatively few checks will take more that a few CPU ticks.

Is it feasible to create 250K critical sections? Maybe, I'm not sure. You can create a very lightweight spinlock with:

while (0 != ::InterlockedExchange(&nFlag, 1)) {};
DoStuff();
nFlag = 0;

An alternate approach might to partition the dataset and have each thread work on a unique set of objects. That makes conflicts impossible so no locking is needed. Depending on the nature of the problem, you might achieve this by having each thread operate on a range of data, or possibly by operating a queue for each worker thread and have one or more scanning threads identify objects needing processing and pushing them onto the appropriate processing queue.

Michael J 2010-07-31 14:26:06

thx for Your answer. Thx for pointing out the eventual feasibility of the simple lock approach. Thx for pointing me towards InterlockedExchange - didn't know about it. Unfortunately partitioning the dataset is not possible.

Dave 2010-07-31 14:38:28

ansaurus

tags:

views:

answers:

Large array and multiple threads - spinlocks or local buffers or something else?

related questions