views:

534

answers:

1

Hi,

We're seeing odd behaviour on RedHat Enterprise Linux systems with pthreads sem_timedwait. It's only occurring with versions 5.3 onwards.

When we create the semaphore on a background thread with sem_init, no error is returned. When we do sem_timedwait, we get an immediate return with errno = 38 (ENOSYS) indicating it's not supported.

If we do the same thing on the main thread, it works as expected and we get no error from sem_timedwait.

We don't see it on RHEL 5.2 or before. We've tried compiling our code with gcc 3.2.3 and 4.1.2 and get the same result, so it seems to be a run-time issue.

So, my questions (finally ;)

1) has anyone else seen this? 2) is it a known issue with RHEL 5.3 onwards? 3) we're using sem_timedwait to sleep a single thread. What alternatives are there on Linux to do the same thing?

If this is a duplicate of another question, let me know. I've looked but can't find one with the same question, just similar ones for OSX which isn't what we're using.

thanks, pxb

Update: just done some more testing with the following results:

  • if I do a 64 bit build using gcc 4.1.2 on a RHEL5.4 box (with -L/usr/lib64 and -lstdc++ -lrt) and run it on a 64 bit install of RHEL5 it works fine
  • if I do a 32 bit build using gcc 4.1.2 on a RHEL5.1 box (with -L/usr/lib and -lstdc++ -lrt) and run it on a exactly the same 64 bit RHEL5 box, we get ENOSYS errors from sem_timedwait

So, it appears to be a difference between the 64 and 32 bit runtime libs on RHEL5.4 (and seemingly RHEL5.3). The only other difference was that the 32 and 64 bit builds were done of RHEL5.1 and RHEL5.4 boxes respectively.

+2  A: 

Finally found out what the issue is. On RHEL 5.4 if we call sem_init then do sem_timedwait we get somewhat random behaviour of the timed wait, depending on where the code is located, whether the object that owns the sem_t is on the heap or stack, etc. Sometimes the timed wait returns immediately with errno = 38 (ENOSYS), sometimes it waits correctly before returning.

Running it via valgrind gives this error:

==32459== Thread 2:
==32459== Syscall param futex(op) contains uninitialised byte(s)
==32459==    at 0x406C78: sem_timedwait (in /lib/libpthread-2.5.so)
==32459==    by 0x8049F2E: TestThread::Run() (in /home/stsadm/semaphore_test/semaphore_test)
==32459==    by 0x44B2307: nxThread::_ThreadProc(void*) (in /home/stsadm/semaphore_test/libcore.so)
==32459==    by 0x4005AA: start_thread (in /lib/libpthread-2.5.so)
==32459==    by 0x355CFD: clone (in /lib/libc-2.5.so)

If I run exactly the same code on RHEL 5.2 the problem goes away and valgrind reports no errors.

If I do a memset on the sem_t variable before calling sem_init the problem goes away on RHEL 5.4

memset( &_semaphore, 0, sizeof( sem_t ) );

So, it looks like a bug has been introduced with semaphores on RHEL5.4 or something that it uses internally, and sem_init isn't correctly initialising the sem_t memory. Or, sem_timed wait has changed to be sensitive to this in a way it wasn't before.

Interestingly, in no cases does sem_init return an error to indicate it didn't work though.

Alternatively, if the expected behaviour is that sem_init won't intialise the memory of sem_t and that's up to the caller, then the behaviour has certainly changed with RHEL 5.4

pxb

Update - here's the test case code in case anyone else wants to try it. Note the problem only occurs when sem_timedwait is called from a .so, and only RHEL5.4 (maybe 5.3 haven't tested it), and only when built as a 32 bit binary (linking against 32 bit libs of course)

1) in semtest.cpp

#include <semaphore.h>
#include <stdio.h>
#include <string.h>
#include <errno.h>
#include <time.h>

void semtest( int semnum, bool initmem )
{
        sem_t sem;

        if ( initmem )
        {
                memset( &sem, 0, sizeof( sem_t ) );
                printf( "sem %d: memset size = %d\n", semnum, sizeof( sem_t ) );
        }

        errno = 0;
        int res = sem_init( &sem, 0, 0 );

        printf( "sem %d: sem_init res = %d, errno = %d\n", semnum, res, errno );

        timespec ts;
        clock_gettime( CLOCK_REALTIME, &ts );
        ts.tv_sec += 1;

        errno = 0;
        res = sem_timedwait( &sem, &ts );

        printf( "sem %d: sem_timedwait res = %d, errno = %d\n\n", semnum, res, errno );
}

2) in main.cpp (note the duplicate test function so we can compare running from within the .so with in the exe)

#include <semaphore.h>
#include <stdio.h>
#include <string.h>
#include <errno.h>
#include <time.h>

extern void semtest( int semnum, bool initmem );

void semtest_in_exe( int semnum, bool initmem )
{
        sem_t sem;

        if ( initmem )
        {
                memset( &sem, 0, sizeof( sem_t ) );
                printf( "sem %d: memset size = %d\n", semnum, sizeof( sem_t ) );
        }

        errno = 0;
        int res = sem_init( &sem, 0, 0 );

        printf( "sem %d: sem_init res = %d, errno = %d\n", semnum, res, errno );

        timespec ts;
        clock_gettime( CLOCK_REALTIME, &ts );
        ts.tv_sec += 1;

        errno = 0;
        res = sem_timedwait( &sem, &ts );

        printf( "sem %d: sem_timedwait res = %d, errno = %d\n\n", semnum, res, errno );
}

int main(int argc, char* argv[], char** envp)
{
        semtest( 1, false );
        semtest( 2, true );
        semtest_in_exe( 3, false );
        semtest_in_exe( 4, true );
}

3) here's the Makefile

all: main

semtest.o: semtest.cpp
        gcc -c -fpic -m32 -I /usr/include/c++/4.1.2 -I /usr/include/c++/4.1.2/i386-redhat-linux semtest.cpp -o semtest.o

libsemtest.so: semtest.o
        gcc -shared -m32 -fpic -lstdc++ -lrt semtest.o -o libsemtest.so

main: libsemtest.so
        gcc -m32 -L . -lsemtest main.cpp -o semtest

The test cases are:

  1. run from within .so without doing memset
  2. run from within .so and do memset
  3. run from within exe without doing memset
  4. run from within exe and do memset

And here's the result running on RHEL5.4

sem 1: sem_init res = 0, errno = 0
sem 1: sem_timedwait res = -1, errno = 38

sem 2: memset size = 16
sem 2: sem_init res = 0, errno = 0
sem 2: sem_timedwait res = -1, errno = 110

sem 3: sem_init res = 0, errno = 0
sem 3: sem_timedwait res = -1, errno = 110

sem 4: memset size = 16
sem 4: sem_init res = 0, errno = 0
sem 4: sem_timedwait res = -1, errno = 110

You can see that case 1 returns immediately with errno = 38.

If we run the exact same code on RHEL5.2 we get the following:

sem 1: sem_init res = 0, errno = 0
sem 1: sem_timedwait res = -1, errno = 110

sem 2: memset size = 16
sem 2: sem_init res = 0, errno = 0
sem 2: sem_timedwait res = -1, errno = 110

sem 3: sem_init res = 0, errno = 0
sem 3: sem_timedwait res = -1, errno = 110

sem 4: memset size = 16
sem 4: sem_init res = 0, errno = 0
sem 4: sem_timedwait res = -1, errno = 110

You can see that all cases now work as expected!

pxb
+1 thanks for the follow-up
pmg
Thanks, helped me too. But I was on Ubuntu 9.10. Code was working before, just a little change broke it (started failing with errno 38). And memset solved the problem.
inazaruk