views:

986

answers:

2

Hello, My server daemon works fine on most machines however on one I am getting:

malloc.c:3074: sYSMALLOc: Assertion `(old_top == (((mbinptr) (((char *) &((av)->bins[((1)
 - 1) * 2])) - __builtin_offsetof (struct malloc_chunk, fd)))) && old_size == 0) ||
 ((unsigned long) (old_size) >= (unsigned long)((((__builtin_offsetof (struct 
malloc_chunk, fd_nextsize))+((2 * (sizeof(size_t))) - 1)) & ~((2 * (sizeof(size_t))) - 
1)))&& ((old_top)->size & 0x1) && ((unsigned long)old_end & pagemask) == 0)' failed.

gdb backtrace:

#4  0x002a8300 in sYSMALLOc (av=<value optimised out>, bytes=<value optimised out>) at malloc.c:3071
#5  _int_malloc (av=<value optimised out>, bytes=<value optimised out>) at malloc.c:4702
#6  0x002a9898 in *__GI___libc_malloc (bytes=16) at malloc.c:3638
#7  0x0804d575 in xmpp_ctx_new (mem=0x0, log=0x0) at src/ctx.c:383
#8  0x0804916e in main (argc=1, argv=0xbffff834) at ../src/adminbot.c:277

Any ideas what to try else ? I am unable to find a bug in my code, it could be a bug in the XMPP library and I need to determine that.

Thanks.

+1  A: 

The assertion almost certainly indicates some kind of memory corruption prior to a call to malloc. Given that the assertion is tripping in xmpp_ctx_new, which appears to be a very early call in the libstrophe XMPP library, I'd say it's very likely that the bug is in your code (though it may not be if you're allocating several XMPP contexts - not sure if there's any reason to do that).

If you're only allocating one XMPP context, you can isolate the bug to your code by inserting a call to malloc(sizeof(xmpp_ctx_t)) prior to calling xmpp_ctx_new, and you'll see the problem isn't in libstrophe. (Incidentally, I'm pretty sure the problem won't be in this call to xmpp_ctx_new because I google'd the source to the function (mem=0x0 looked likely to cause problems), and saw that it basically reduced to malloc and a few initializers - reading the source is generally a good strategy for looking for bugs in OSS.)

Aidan Cully
+1  A: 

This is almost certainly due to a heap corruption bug in your code (writing just before or just after an allocated block).

Since you are apparently on Linux, the tool to use here is Valgrind. It should point you straight at the problem, and it should do so even on machines where your daemon "works".

Trying anything other than Valgrind for this kind of problem is likely a waste of time.

Employed Russian