views:

920

answers:

6

I have an odd bug in my program, it appears to me that malloc() is causing a SIGSEGV, which as far as my understanding goes does not make any sense. I am using a library called simclist for dynamic lists.

Here is a struct that is referenced later:

typedef struct {
    int msgid;
    int status;
    void* udata;
    list_t queue;
} msg_t;

And here is the code:

msg_t* msg = (msg_t*) malloc( sizeof( msg_t ) );

msg->msgid = msgid;
msg->status = MSG_STAT_NEW;
msg->udata = udata;
list_init( &msg->queue );

list_init is where the program fails, here is the code for list_init:

/* list initialization */
int list_init(list_t *restrict l) {
    if (l == NULL) return -1;

    srandom((unsigned long)time(NULL));

    l->numels = 0;

    /* head/tail sentinels and mid pointer */
    l->head_sentinel = (struct list_entry_s *)malloc(sizeof(struct list_entry_s));
    l->tail_sentinel = (struct list_entry_s *)malloc(sizeof(struct list_entry_s));
    l->head_sentinel->next = l->tail_sentinel;
    l->tail_sentinel->prev = l->head_sentinel;
    l->head_sentinel->prev = l->tail_sentinel->next = l->mid = NULL;
    l->head_sentinel->data = l->tail_sentinel->data = NULL;

    /* iteration attributes */
    l->iter_active = 0;
    l->iter_pos = 0;
    l->iter_curentry = NULL;

    /* free-list attributes */
    l->spareels = (struct list_entry_s **)malloc(SIMCLIST_MAX_SPARE_ELEMS * sizeof(struct list_entry_s *));
    l->spareelsnum = 0;

#ifdef SIMCLIST_WITH_THREADS
    l->threadcount = 0;
#endif

    list_attributes_setdefaults(l);

    assert(list_repOk(l));
    assert(list_attrOk(l));

    return 0;
}

the line l->spareels = (struct list_entry_s **)malloc(SIMCLIST_MAX_SPARE_ELEMS * is where the SIGSEGV is caused according to the stack trace. I am using gdb/nemiver for debugging but am at a loss. The first time this function is called it works fine but it always fails the second time. How can malloc() cause a SIGSEGV?

This is the stack trace:

#0  ?? () at :0
#1  malloc () at :0
#2  list_init (l=0x104f290) at src/simclist.c:205
#3  msg_new (msg_switch=0x1050dc0, msgid=8, udata=0x0) at src/msg_switch.c:218
#4  exread (sockfd=8, conn_info=0x104e0e0) at src/zimr-proxy/main.c:504
#5  zfd_select (tv_sec=0) at src/zfildes.c:124
#6  main (argc=3, argv=0x7fffcabe44f8) at src/zimr-proxy/main.c:210

Any help or insight is very appreciated!

+10  A: 

malloc can segfault for example when the heap is corrupted. Check that you are not writing anything beyond the bounds of any previous allocation.

laalto
And use valgrind! Other posters mentioned this tool, but sadly didn't get many upvotes. Nothing wrong with this answer, but any discussion of memory corruption is incomplete without an admonishment to use valgrind!
Andy Ross
+7  A: 

You probably have corrupted you heap somewhere before this call by a buffer overflow or by calling free with a pointer that wasn't allocated by malloc (or that was already freed).

If the internal data structures used by malloc get corrupted this way, malloc is using invalid data and might crash.

sth
+7  A: 

Probably memory violation occurs in other part of your code. If you are on Linux, you should definitely try valgrind. I would never trust my own C programs unless it passes valgrind.

+1  A: 

You should try to debug this code in isolation, to see if the problem is actually located where the segfault is generated. (I suspect that it is not).

This means:

#1: Compile the code with -O0, to make sure that gdb gets correct line numbering information.

#2: Write a unit test which calls this part of the code.

My guess is that the code will work correctly when used separately. You can then test your other modules in the same way, until you find out what causes the bug.

Using Valgrind, as others have suggested, is also a very good idea.

Jørgen Fogh
+2  A: 

There are a myriad ways of triggering a core dump from malloc() (and realloc() and calloc()). These include:

  • Buffer overflow: writing beyond the end of the allocated space (trampling control information that malloc() was keeping there).
  • Buffer underflow: writing before the start of the allocated space (trampling control information that malloc() was keeping there).
  • Freeing memory that was not allocated by malloc(). In a mixed C and C++ program, that would include freeing memory allocated in C++ by new.
  • Freeing a pointer that points part way through a memory block allocated by malloc() - which is a special case of the previous case.
  • Freeing a pointer that was already freed - the notorious 'double free'.

Using a diagnostic version of malloc() or enabling diagnostics in your system's standard version, may help identify some of these problems. For example, it may be able to detect small underflows and overflows (because it allocates extra space to provide a buffer zone around the space that you requested), and it can probably detect attempts to free memory that was not allocated or that was already freed or pointers part way through the allocated space - because it will store the information separately from the allocated space. The cost is that the debugging version takes more space. A really good allocator will be able to record the stack trace and line numbers to tell you where the allocation occurred in your code, or where the first free occurred.

Jonathan Leffler
A: 

The code is problematic. If malloc returns NULL, this case is not handled correctly in your code. You simply assume that memory has been allocated for you when it actually has not been. This can cause memory corruption.

steve