Finally found out what the issue is. On RHEL 5.4 if we call sem_init then do sem_timedwait we get somewhat random behaviour of the timed wait, depending on where the code is located, whether the object that owns the sem_t is on the heap or stack, etc. Sometimes the timed wait returns immediately with errno = 38 (ENOSYS), sometimes it waits correctly before returning.
Running it via valgrind gives this error:
==32459== Thread 2:
==32459== Syscall param futex(op) contains uninitialised byte(s)
==32459== at 0x406C78: sem_timedwait (in /lib/libpthread-2.5.so)
==32459== by 0x8049F2E: TestThread::Run() (in /home/stsadm/semaphore_test/semaphore_test)
==32459== by 0x44B2307: nxThread::_ThreadProc(void*) (in /home/stsadm/semaphore_test/libcore.so)
==32459== by 0x4005AA: start_thread (in /lib/libpthread-2.5.so)
==32459== by 0x355CFD: clone (in /lib/libc-2.5.so)
If I run exactly the same code on RHEL 5.2 the problem goes away and valgrind reports no errors.
If I do a memset on the sem_t variable before calling sem_init the problem goes away on RHEL 5.4
memset( &_semaphore, 0, sizeof( sem_t ) );
So, it looks like a bug has been introduced with semaphores on RHEL5.4 or something that it uses internally, and sem_init isn't correctly initialising the sem_t memory. Or, sem_timed wait has changed to be sensitive to this in a way it wasn't before.
Interestingly, in no cases does sem_init return an error to indicate it didn't work though.
Alternatively, if the expected behaviour is that sem_init won't intialise the memory of sem_t and that's up to the caller, then the behaviour has certainly changed with RHEL 5.4
pxb
Update - here's the test case code in case anyone else wants to try it. Note the problem only occurs when sem_timedwait is called from a .so, and only RHEL5.4 (maybe 5.3 haven't tested it), and only when built as a 32 bit binary (linking against 32 bit libs of course)
1) in semtest.cpp
#include <semaphore.h>
#include <stdio.h>
#include <string.h>
#include <errno.h>
#include <time.h>
void semtest( int semnum, bool initmem )
{
sem_t sem;
if ( initmem )
{
memset( &sem, 0, sizeof( sem_t ) );
printf( "sem %d: memset size = %d\n", semnum, sizeof( sem_t ) );
}
errno = 0;
int res = sem_init( &sem, 0, 0 );
printf( "sem %d: sem_init res = %d, errno = %d\n", semnum, res, errno );
timespec ts;
clock_gettime( CLOCK_REALTIME, &ts );
ts.tv_sec += 1;
errno = 0;
res = sem_timedwait( &sem, &ts );
printf( "sem %d: sem_timedwait res = %d, errno = %d\n\n", semnum, res, errno );
}
2) in main.cpp (note the duplicate test function so we can compare running from within the .so with in the exe)
#include <semaphore.h>
#include <stdio.h>
#include <string.h>
#include <errno.h>
#include <time.h>
extern void semtest( int semnum, bool initmem );
void semtest_in_exe( int semnum, bool initmem )
{
sem_t sem;
if ( initmem )
{
memset( &sem, 0, sizeof( sem_t ) );
printf( "sem %d: memset size = %d\n", semnum, sizeof( sem_t ) );
}
errno = 0;
int res = sem_init( &sem, 0, 0 );
printf( "sem %d: sem_init res = %d, errno = %d\n", semnum, res, errno );
timespec ts;
clock_gettime( CLOCK_REALTIME, &ts );
ts.tv_sec += 1;
errno = 0;
res = sem_timedwait( &sem, &ts );
printf( "sem %d: sem_timedwait res = %d, errno = %d\n\n", semnum, res, errno );
}
int main(int argc, char* argv[], char** envp)
{
semtest( 1, false );
semtest( 2, true );
semtest_in_exe( 3, false );
semtest_in_exe( 4, true );
}
3) here's the Makefile
all: main
semtest.o: semtest.cpp
gcc -c -fpic -m32 -I /usr/include/c++/4.1.2 -I /usr/include/c++/4.1.2/i386-redhat-linux semtest.cpp -o semtest.o
libsemtest.so: semtest.o
gcc -shared -m32 -fpic -lstdc++ -lrt semtest.o -o libsemtest.so
main: libsemtest.so
gcc -m32 -L . -lsemtest main.cpp -o semtest
The test cases are:
- run from within .so without doing memset
- run from within .so and do memset
- run from within exe without doing memset
- run from within exe and do memset
And here's the result running on RHEL5.4
sem 1: sem_init res = 0, errno = 0
sem 1: sem_timedwait res = -1, errno = 38
sem 2: memset size = 16
sem 2: sem_init res = 0, errno = 0
sem 2: sem_timedwait res = -1, errno = 110
sem 3: sem_init res = 0, errno = 0
sem 3: sem_timedwait res = -1, errno = 110
sem 4: memset size = 16
sem 4: sem_init res = 0, errno = 0
sem 4: sem_timedwait res = -1, errno = 110
You can see that case 1 returns immediately with errno = 38.
If we run the exact same code on RHEL5.2 we get the following:
sem 1: sem_init res = 0, errno = 0
sem 1: sem_timedwait res = -1, errno = 110
sem 2: memset size = 16
sem 2: sem_init res = 0, errno = 0
sem 2: sem_timedwait res = -1, errno = 110
sem 3: sem_init res = 0, errno = 0
sem 3: sem_timedwait res = -1, errno = 110
sem 4: memset size = 16
sem 4: sem_init res = 0, errno = 0
sem 4: sem_timedwait res = -1, errno = 110
You can see that all cases now work as expected!