tags:

views:

639

answers:

2

Hi all,

I have created a TCP client that connects to a listening server. We implemeted TCP keep alive also. Some times the client crashes and core dumped. Below are the core dump traces.

Problem is in linux kernel version Update 4, kernel 2.6.9-42.0.10.

we had two core dumps.

(gdb) where
#0 0x005e77a2 in _dl_sysinfo_int80 () from /ddisk/d303/dumps/mhx239131/ld-
linux.so.2
#1 0x006c8bd1 in connect () from /ddisk/d303/dumps/mhx239131/libc.so.6
#2 0x08057863 in connect_to_host ()
#3 0x08052f38 in open_ldap_connection ()
#4 0x0805690a in new_connection ()
#5 0x08052cc9 in ldap_open ()
#6 0x080522cf in checkHosts ()
#7 0x08049b36 in pollLDEs ()
#8 0x0804d1cd in doOnChange ()
#9 0x0804a642 in main ()

(gdb) where
#0 0x005e77a2 in _dl_sysinfo_int80 () from /ddisk/d303/dumps/mhx239131/ld-
linux.so.2
#1 0x0068ab60 in __nanosleep_nocancel ( 
from /ddisk/d303/dumps/mhx239131/libc.so.6
#2 0x080520a2 in Sleep ()
#3 0x08049ac1 in pollLDEs ()
#4 0x0804d1cd in doOnChange ()
#5 0x0804a642 in main ()

We have tried to reproduce the problem in our environment, but we could not.

What would cause the core file?

Please help me to avoid such situation.

Thanks, Naga

+1  A: 

_dl_sysinfo_int80 is just a function which does a system call into the kernel. So the core dump is happening on a system call (probably the one used by connect in the first example and nanosleep in the second example), probably because you are passing invalid pointers.

The invalid pointers could be because the code which calls these functions being broken or because somewhere else in the program is broken and corrupting the program's memory.

Take a look at two frames above (frame #2) in the core dump for both examples and check the parameters being passed. Unfortunately, it seems you did not compile with debug information, making it harder to see them.

Additionally, I would suggest trying valgrind and seeing if it finds something.

CesarB
A: 

Your program almost cetainly did not coredump in either of the above places.

Most likely, you either have multiple threads in your process (and some other thread caused the core dump), or something external caused your process to die (such as 'kill -SIGABRT <pid>').

If you do have multiple threads, GDB 'info threads' and 'thread apply all where' are likely to provide further clues.