views:

56

answers:

3
  • I have an array with 250k entities (with a size of 20 bytes) and 4 threads.
  • Each thread modifies ~100k random entities (that means, you can't predict which entities it will touch). It can modify the same entity multiple times.
  • Each entity gets modified up to ~10 times.
  • The modification takes ~10-6 seconds.
  • Each entity gets modified by multiple threads

The both last points are the most important facts. The 5th point implies that I need a mechanism to protect my entities from getting corrupted due to concurrent access/modification. The 4th point makes me worry whether classical locking mechanisms like mutexes, which block threads, create to much overhead considering the short timespan a lock would be acquired for.

I've come up with two ideas:

  • Using spinlocks to overcome the overhead (presupposing my assumption about the overhead is correct in the first place).
  • Giving each thread a local copy of the array it can modify without interruptions. After all threads have finished, merging all arrays into one. This is possible because I'm able to pick a winner, if there are multiple copies of an entity.

What do You recommend? Do You agree on one of my ideas or do You recommend something else? Does Your recommendation change if I change the numbers to?:

  • 1M entities
  • 8 threads
  • ~500k random accesses
  • ~100 modifications per entity

Please also point me towards implementations in C#/.Net. Thanks in advance.

Additional Informations
The entities are value types (structs). I can not afford to create new objects for each write operation - only modify existing primitives.

+1  A: 

It seems like the simplest solution can fit here. The simplest solution is to lock the instance the thread currently manipulates.

I base it on a simple execution with and without locks.

This run, that locks the instance, takes ~10.09 seconds. The same run, without lock takes ~9.03 seconds:

const int numOfPersons = 250000;
var persons = new List<Person>(numOfPersons);
for (int i = 0; i < numOfPersons; i++)
{
    persons.Add(new Person());
}

var rand = new Random();

var sw = Stopwatch.StartNew();

for (int j = 0; j < 100; j++)
{
    for (int i = 0; i < 100000; i++)
    {
        int index = rand.Next(0, numOfPersons);

        Person person = persons[index];
        lock (person)
        {
            person.Name += "a";
        }
    }
}

Console.WriteLine(sw.Elapsed);

Since the ratio of elements to threads is big enough, the expectation of time each thread needs to wait for an instance is negligible.

As can be seen from the example, the time overhead for locking the instances is ~1 second. This code does 100 times 100,000 modifications in a collection of size of 250,000 items. The 1 second time is roughly constant no matter what the modifications are.

Elisha
This will only work if the array the OP mentions contains reference types.
chibacity
@chibacity, you're right. In case the elements are value types the lock will be meaningless.
Elisha
@Elisha thx for Your answer. The elements actually are value types. But it would be no problem to extend the array to hold an additional lockable object for each entity.
Dave
@Elisha which part of Your code did You time? 6 seconds seem way to much to process 250k elements. Maybe Your code is modification bound and not locking bound. What accounts now for 0.45% could be much more in my scenario. As I said a modification takes ~10^-6 seconds so iterating over 250k elements would take ~0.25 seconds - on a single core machine.
Dave
@Dave The example contains string concatenation which probably accounts for the difference. I'm guessing you are merely setting a value which would have much lower overhead.
chibacity
@Dave, I updated the example to measure only the time the modifications take. In addition, it should be noted that the locking time is roughly constant, the usage I did before with percentage is misleading.
Elisha
@Elisha thx for the update. The code is executed from a single thread, right? Do You know how performance of locking an already locked object (by an other thread), differs from the current performance? Don't rush to update Your example!, I will test this by myself - You already helped me much by pointing out that this simple solution could be feasible. A mere guess would be fine at this point of time.
Dave
@Dave, the main issue with locking will be that when an item is locked other threads trying to update it will wait and do nothing until the update is finished by the other thread. In the case where you have 1 thread to ~60,000 items collisions are rare. I tried a version of the example in 4 threads (each thread updated 25,000 items) - the overhead of the version with locks was ~1.1 seconds.
Elisha
@Elisha Just to clarify, threads do not wait when there is monitor lock contention, they get put to sleep i.e. they are de-scheduled which adds a large amount of overhead. Waiting is more accurate to use as a description for spin locks.
chibacity
@chibacity, you're right again :) Waiting is not the correct description. It does put the thread to sleep. Even though, since collisions are not very likely to happen here a lot, the expectation of this overhead is tiny.
Elisha
A: 

It seems that your entities are structures (of 20 bytes each). This is a wild guess, because I have no idea what you're actually trying to do, but can't you make those entities immutable reference types?

When you make immutable reference types, your array will only consist of references, which will be 4 bytes (or 8 bytes on 64 bits) in size and changing a reference will always be an atomic operation (unless you explicitly change alignment of course). Changing an entity means creating a new one and replacing the reference in the array from the old to the new. This way changes are atomic. However, you can still loose changes when two threads write to the same slot shortly after each other (but you don't seem to worry about that, because you are talking about 'picking a winner').

I have no idea what this will do to the performance, because your might explicitly choose for a value type array instead of a reference type array. However, sometimes it is good to make the solution simpler, not harder. This solution might also improve cache locality, because you are talking about random access to a big array. This array will therefore not fit into the CPU’s cache and you will have a lot of cache misses.

Steven
"However, you can still loose changes when two threads write to the same slot shortly after each other" - I do worry about that! Picking a winner is possible if I have ALL copies. But in the scenario you describe, one copy gets lost forever and nobody will ever notice.
Dave
You can also use a concurrent queue and add every change to it, or use a queue per thread. This way you won't loose any changes.
Steven
+1  A: 

As they say, there's more than one way to skin a cat (though why anybody wants skinned cat is another question) :-)

With 250K objects and 4 threads, you'd have to guess that conflicts will be (relatively) rare. That doesn't mean that we can ignore them, but it may affect how we look for them. Testing a critical section is very fast, unless there is actually a conflict. That means that it might be feasible to check a critical section for every transaction, in the knowledge that relatively few checks will take more that a few CPU ticks.

Is it feasible to create 250K critical sections? Maybe, I'm not sure. You can create a very lightweight spinlock with:

while (0 != ::InterlockedExchange(&nFlag, 1)) {};
DoStuff();
nFlag = 0;

An alternate approach might to partition the dataset and have each thread work on a unique set of objects. That makes conflicts impossible so no locking is needed. Depending on the nature of the problem, you might achieve this by having each thread operate on a range of data, or possibly by operating a queue for each worker thread and have one or more scanning threads identify objects needing processing and pushing them onto the appropriate processing queue.

Michael J
thx for Your answer. Thx for pointing out the eventual feasibility of the simple lock approach. Thx for pointing me towards InterlockedExchange - didn't know about it. Unfortunately partitioning the dataset is not possible.
Dave