views:

218

answers:

3

I have written a multithreaded program and the thread was implemented in such a way that as to fork a child process and through this child process several modules were loaded.

During my testing at one time, i find process (running in solaris platform) aborts one time and it creates a segementation fault. On going through the dump file, i really shocked to see that fork() system call in solaris causes this segmentation fault.

Below was the stack trace at the time of fork() abort:

(l@5) stopped in (unknown) at 0xfe524970
0xfe524970:     <bad address 0xfe524970>
(/opt/SUNWspro/bin/../WS6U2/bin/sparcv9/dbx) where
  [1] 0xfe524970(0xfe524970, 0x0, 0xffffffff, 0x1, 0x0, 0x0), at 0xfe52496f
  [2] run_prefork(0xfecc04b8, 0xfecc04d0, 0x242f4, 0xfea5d3c8, 0x0, 0x0), at 0xfec97ce8
  [3] _ti_fork1(0x1, 0x1ab18, 0x0, 0x0, 0x0, 0x0), at 0xfea5d3c8
  [4] _ti_fork(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfea5d50c

Can anyone describe why the fork() system call in solaris causes this behaviour?

+1  A: 

Addition:

A possible scenario might be (let's take C++ as an example):

  • creation of an object in thread A
  • thread B calls a method that calls fork()
  • thread A deletes the object while thread B still is before running fork()
  • Then memory addresses are invalid during the actual call to fork()

according to timings this might more or less likely happen. Maybe you can enforce this situation by introducing some sleeps... (if you feel this might be the case at all)

Another issue could be some hardware defect. I would let a tool to check the RAM run and see if there are problems before looking any further. Let us know if this was the case.

Another possibility would be a bug in the system code which would not explain why sometimes it works sometimes not. For me it sounds unlikely that it is the issue.

PS the address at 0xfe52496f is odd/ not multiple of four which is not usual for optimized programs. That also is a hint in direction of a defect RAM... I hope I am wrong, on the other hand if I am right, you know what to do...

jdehaan
Can you elaborate how hardware defect cause segmentation fault in for() system call?
I am not sure to understand what you mean. You ask me if it is a memory defect why specifically does it show up in a call to fork(). Maybe because the lib gets always mapped around the same physical address? In the meanwhile I have another scenario that is maybe helpful to you (see answer text again)
jdehaan
A: 

I think your stack or stack pointer may have been corrupted at the point where you make the call to fork. Either that or you have actually used up your stack space and the stack pointer is just shy of that limit before you make the call to fork().

Calling other functions or allocating a moderate amount of memory with alloca and memset that area to 0 just before the call to fork would reveal if this is the case as the error would present itself earlier.

It might also be possible that if you are forking in a non-main thread of a process (I'm not familiar with Solaris's threading model so I could be spouting jibberish) that you have somehow specified/allocated this thread's (the one calling fork) stack in such a way that prevents it from being accessible to the new process after the fork.

Is this repeatable? Does it happen consistently?

nategoose
A: 

Mixing fork and threads is generally ill-advised. This is because the forked process will only have a single thread, the thread that called fork. All the other threads do not exist in the new process, which means that any shared memory resources are in an unknown state. Another thread could hold a mutex and never release it, etc. There are mechanisms to mitigate this such as pthread_atfork, but as a general rule, you should only fork to call exec as soon as possible when working with multiple threads. Are you segfaulting in the parent process or the new child process?

Logan Capaldo