tags:

views:

835

answers:

5

Hello, I'm trying to understand what's going wrong with a program run in HP-UX 11.11 that results in a SIGSEGV (11, segmentation fault):

(gdb) bt
#0  0x737390e8 in _sigfillset+0x618 () from /usr/lib/libc.2
#1  0x73736a8c in _sscanf+0x55c () from /usr/lib/libc.2
#2  0x7373c23c in malloc+0x18c () from /usr/lib/libc.2
#3  0x7379e3f8 in _findbuf+0x138 () from /usr/lib/libc.2
#4  0x7379c9f4 in _filbuf+0x34 () from /usr/lib/libc.2
#5  0x7379c604 in __fgets_unlocked+0x84 () from /usr/lib/libc.2
#6  0x7379c7fc in fgets+0xbc () from /usr/lib/libc.2
#7  0x7378ecec in __nsw_getoneconfig+0xf4 () from /usr/lib/libc.2
#8  0x7378f8b8 in __nsw_getconfig+0x150 () from /usr/lib/libc.2
#9  0x737903a8 in __thread_cond_init_default+0x100 () from /usr/lib/libc.2
#10 0x737909a0 in nss_search+0x80 () from /usr/lib/libc.2
#11 0x736e7320 in __gethostbyname_r+0x140 () from /usr/lib/libc.2
#12 0x736e74bc in gethostbyname+0x94 () from /usr/lib/libc.2
#13 0x11780 in dnetResolveName (name=0x400080d8 "smtp.org.com", hent=0x737f3334) at src/dnet.c:64
..

The problem seems to be occurring somewhere inside libc! A system call trace ends with:

Connecting to server smtp.org.com on port 25
write(1, "C o n n e c t i n g   t o   s e ".., 51) .......................... = 51
open("/etc/nsswitch.conf", O_RDONLY, 0666) ............................... [entry]
open("/etc/nsswitch.conf", O_RDONLY, 0666) ................................... = 5
  Received signal 11, SIGSEGV, in user mode, [SIG_DFL], partial siginfo
    Siginfo: si_code: I_NONEXIST, faulting address: 0x400118fc, si_errno: 0
    PC: 0xc01980eb, instruction: 0x0d3f1280
exit(11) [implicit] ............................ WIFSIGNALED(SIGSEGV)|WCOREDUMP

Last instruction by the program:

struct hostent *him;
him = gethostbyname(name); // name == "smtp.org.com" as shown by gdb

Is this a problem with the system, or am I missing something? Any guidance for digging deeper would be appreciated.

Thx.

+2  A: 

Whenever this situation happens to me (unexpected segfault in a system lib), it is usually because I did something foolish somewhere else, i.e. buffer overrun, double delete on a pointer, etc.

In those instances where my mistake is not obvious, I use valgrind. Something like the following is usually sufficient:

valgrind -v --leak-check=yes --show-reachable=yes ./myprog

I assume valgrind may be used in HP-UX...

Throwback1986
No valgrind - valgrind is a simulator; it only works on particular architectures. (x86, AMD64, powerpc). And even then only on certain operating systems. It would be easier to port the application...
Douglas Leeder
I don't have valgrind on my HP system :-(
A: 

Reading the (OS X) manpage says that gethostbyname() returns a pointer, but as far as I can tell may not be allocating memory for that pointer. Do you need to malloc() first? Try this:

struct hostent *him = malloc(sizeof(struct hostent));
him = gethostbyname(name);
...
free(him);

Does that work any better?

EDIT: I tested this and it's probably wrong. Granted I used the bare string "stmp.org.com" instead of a variable, but both versions (with and without malloc()ing) worked on OS X. Maybe HP-UX is different.

Chris Lutz
That can't be it - the structure you've allocated is not passed to the function...
Douglas Leeder
gethostbyname() returns a pointer that it allocates, which can be problematic in multithreaded code and is why on some systems there's also gethostbyname_r(). But I digress.
Craig S
It was a longshot guess, but I believe it was wrong. Oh well. I tried.
Chris Lutz
+1  A: 

Well its just a rough guess and might be completly wrong but is the dns string (smtp.org.com) null terminated?

Seq
This would be my guess as well.
Douglas Leeder
gdb says its: 0x400080e8: 111 'o' 109 'm' 0 '\000'after the m of .com, there is a \000.
+1  A: 

Your stack trace is in malloc which almost certainly means that somewhere you corrupted one of malloc's data structures. As a previous answer said, you likely have a buffer overrun or underrun and corrupted one of the items allocated off the heap.

Another explanation is that you tried to do a free on something that didn't come from the heap, but that's less likely--that would probably have crashed right in free.

Jared Oberhaus
It's also possible that he's passed a non-null-terminated string to some syscall, or passed a free()'d string to a system call.
slacy
A: 

Long story short: vsnprintf corrupted my heap under HP-UX 11.11. vsnprintf was introduced in C99 (ISO/IEC 9899:1999) and "is equivalent to snprintf, with the variable argument list" (§7.19.6.12.2), snprintf (§7.19.6.5.2): "If n is zero, nothing is written". Well, HP UX 11.11 doesn't comply with this specification. When 2nd arg == 0, arguments are written at the end of the 1st arg.. which, of course, corrupts the heap (I allocate no space when maxsize==0, given that nothing should be written).

HP manual pages are unclear ("It is the user's responsibility to ensure that enough storage is available."), nothing is said regarding the case of maxsize==0. Nice trap.. at the very least, the WARNINGS section of the man page should warn std-compliant users..

It's an egg/chicken pb: vnsprintf is variadic, so for the "user's responsibility" to ensure that enough storage is available" the "user's responsibility" must first know how much space is needed. And the best way to do that is to call vnsprintf with 2nd arg == 0: it should then return the amount of space required and sprintfs nothing.. well, except HP's ! One solution to use vnsprintf under this std violation to determine needed space: malloc 1 byte more to your buffer (1st arg) and call vnsprintf(buf+buf.length,1,..). This only puts a \0 in the new byte you allocated. Silly, but effective. If you're under wchar conditions, malloc(sizeof..).

Anyway, workaround is trivial : never call v/snprintf under HP-UX with maxsize==0! I now have a happy stable program!

Thanks to all contributers.


Heap corruption through vsnprintf under HP-UX B11.11 This program prints "@@" under Linux/Cygwin/.. It prints "@fooo@" under HP-UX B11.11:

#include <stdarg.h>
#include <stdio.h>

const int S=2;

void f (const char *fmt, ...) {
        va_list ap;
        int actualLen=0;
        char buf[S];

        bzero(buf, S);

        va_start(ap, fmt);
        actualLen = vsnprintf(buf, 0, fmt, ap);
        va_end(ap);

        printf("@%s@\n", buf);
}

int main () {
        f("%s", "fooo");
        return 0;
}