ansaurus

Question

Performance test: sem_t v.s. dispatch_semaphore_t and pthread_once_t v.s. dispatch_once_t

Answer 1

+1 A:

sem_wait() and sem_post() are heavy weight synchronization facilities that can be used between processes. They always involve round trips to the kernel, and probably always require your thread to be rescheduled. They are generally not the right choice for in-process synchronization. I'm not sure why the named variants would be slower than the anonymous ones...

Mac OS X is actually pretty good about Posix compatibility... But the Posix specifications have a lot of optional functions, and the Mac doesn't have them all. Your post is actually the first I've ever heard of pthread_barriers, so I'm guessing they're either relatively recent, or not all that common. (I haven't paid much attention to pthreads evolution for the past ten years or so.)

The reason the dispatch stuff falls apart under forced extreme contention is probably because under the covers the behavior is similar to spin locks. Your dispatch worker threads are very likely wasting a good chunk of their quanta under the optimistic assumption that the resource under contention is going to be available any cycle now... A bit of time with Shark would tell you for sure. The take-home point, though, should be that "optimizing" the thrashing during contention is a poor investment of programmer time. Instead spend the time optimizing the code to avoid heavy contention in the first place.

If you really have a resource that is an un-avoidable bottleneck within your process, putting a semaphore around it is massively sub-optimal. Put it on its own serial dispatch queue, and as much as possible dispatch_async blocks to be executed on that queue.

Finally, dispatch_once() is faster than pthread_once() because it's spec'd and implemented to be fast on current processors. Probably Apple could speed up the pthread_once() implementation, as I suspect the reference implementation uses pthread synchronization primitives, but... well... they've provided all of the libdispatch goodness instead. :-)

Kaelin Colclasure 2010-09-05 18:07:19

Good point about the sem_wait/post since dispatch_semaphores do not have to deal with the context switch to the kernel (it seems like a duh! now ;). I was responsible for adding POSIX compatibility to a home grown kernel for embedded systems and have found barriers useful for creating unit tests. I was not trying to optimize one particular situation over another, but rather try to figure out of the 2 tools I am given which is the best tool for the job (if I have the option of screwing in a screw with a philips or flathead... I will use philips). Updating question with motivation...

Brent Priddy 2010-09-06 10:47:34

pthread_once() can/should be implemented with atomics for the "called" check; when waiting for the call to complete yes I agree it would be using a pthread_mutex to block other threads (that is how I did it). In this test case though, there is no blocking and no need for a kernel context switch. With this said I still don't understand why there is a 10x difference. I guess the libdispatch is optimized more than posix calls.

Brent Priddy 2010-09-06 11:05:43

another observation: anonymous semaphores should operate like the dispatch_semaphores since you cant retrieve them from another process. You should only have to context switch to the kernel when you actually have to block or wake up blocked threads (given that apple is using atomics for semaphores, which is what I did for our semaphores in our OS).

Brent Priddy 2010-09-06 19:28:25

These may be using Mach semaphores on Darwin, either for interoperability or simply because they were already there…

Kaelin Colclasure 2010-09-08 14:24:17

ansaurus

tags:

views:

answers:

Performance test: sem_t v.s. dispatch_semaphore_t and pthread_once_t v.s. dispatch_once_t

related questions