views:

89

answers:

3

With reference to my previous question about GDB not pinpointing the SIGSEGV point,

My thread code is as follows:

void *runner(void *unused)
{
 do
 {
 sem_wait(&x);
  ...

  if(/*condition 1 check*/)
  {
   sem_post(&x);
   sleep(5);
   sem_wait(&x);
   if(/*repeat condition 1 check; after atleast 5 seconds*/)
   {
    printf("LEAVING...\n");
    sem_post(&x); 
    // putting exit(0); here resolves the dilemma
    return(NULL);  
   }
  }
 sem_post(&x);
 }while(1);

}

Main code:

sem_t x;    

int main(void)
{   
    sem_init(&x,0,1);
        ...
    pthread_t thrId;
    pthread_create(&thrId,NULL,runner,NULL);
        ...
    pthread_join(thrId,NULL);
    return(0);
}

Edit: Having an exit(0) in the runner thread code, makes the fault vanish.


What could be the reasons behind the stack corruption?

GDB Output: (0xb7fe2b70 is runner thread id)

LEAVING...
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb7fe2b70 (LWP 2604)]
0x00000011 in ?? ()

Valgrind Output:

==3076== Thread 2:
==3076== Jump to the invalid address stated on the next line
==3076==    at 0x11: ???
==3076==    by 0xA26CCD: clone (clone.S:133)
==3076==  Address 0x11 is not stack'd, malloc'd or (recently) free'd
==3076== 
==3076== 
==3076== Process terminating with default action of signal 11 (SIGSEGV)
==3076==  Bad permissions for mapped region at address 0x11
==3076==    at 0x11: ???
==3076==    by 0xA26CCD: clone (clone.S:133)
==3076==  Address 0x11 is not stack'd, malloc'd or (recently) free'd
+1  A: 

Use valgrind or an equivalent memory checking tool to figure it out. Stop guessing. Also stop posting incomplete code, especially if you don't know if it has a problem or not. The bug could be outside of this function. For exemple, maybe the semaphore isn't initialized.

From the valgrind output, i can suggest that your pthread_create() line must contains a invalid function pointer. So pthread jumps to that fake address, and crashes. Obviously there is no stack ...

BatchyX
The semaphore has been initialized, Valgrind output posted.
Kedar Soparkar
I only would trust you if valgrind is not complaining about it. Seriously, Stop Guessing.
BatchyX
I've put down the main code. I don't know what more I can post, without dumping the entire source file here.
Kedar Soparkar
@Kedar: Then do that. If the code is very large, post it on pastebin.com and post a link to that.
Adam Rosenfield
have gdb run the program up to the pthread_create line, then use 'dissass $eip, $eip+40' to show the assembly code of main that calls pthread_create. also, maybe compile you code with -Wall -Wextra.
BatchyX
+1  A: 

All the important parts are missing in your code, but the most common reasons for stack corruption:

  • Storing a pointer to an element on the stack and using it after the object is already out of scope.
  • Buffer overrun, like having a char buffer[20] on the stack and writing outside the bounds (sprintf is a fantastic way to accomplish that).
  • Bad cast, i.e. having a base class A on the stack, casting it to a derived class and using it.
EboMike
his stack doesn't seems to be corrupted.
BatchyX
+3  A: 

Write a new source file with a main function that does the same things as the main you posted here except rather than using pthread_create just call the function. See if you can recreate the issue independent of using threads. From the way things look your semaphores should still work just fine in a single threaded environment.

If this still fails you will have an easier time debugging it.

Since you said that calling exit rather than returning did not yield the error it would suggest that you have corrupted either the return address that is on the stack when runner is started. By calling exit you don't rely on this memory area to get to an exiting function (if you had returned pthread_exit would have been called by the pthread library code that had called runner). I think that the valgrind output is not 100% accurate -- not due to any fault in valgrind, but because the place where you are triggering the error coupled with the type of error you are triggering makes this very difficult to be sure who called what.

Some gcc flags you may be interested in:

-fstack-protector-all -Wstack-protector

The warning option doesn't work without the -f option here.

You may also want to try:

-fno-omit-frame-pointer
nategoose