views:

72

answers:

2

That's a single threaded code.

In particular: ahocorasick Python extension module (easy_install ahocorasick).

I isolated the problem to a trivial example:

import ahocorasick
t = ahocorasick.KeywordTree()
t.add("a")

When I run it in gdb, all is fine, same happens when I enter these instructions into Python CLI. However, when I try to run the script regularily, I get a segfault.

To make it even weirder, the line that causes segfault (identified by core dump analysis) is a regular int incrementation (see the bottom of the function body).

I'm completely stuck by this moment, what can I do?

int
aho_corasick_addstring(aho_corasick_t *in, unsigned char *string, size_t n)
{
    aho_corasick_t* g = in;
    aho_corasick_state_t *state,*s = NULL;
    int j = 0;

    state = g->zerostate;

    // As long as we have transitions follow them
    while( j != n &&
           (s = aho_corasick_goto_get(state,*(string+j))) != FAIL )
    {
        state = s;
        ++j;
    }

    if ( j == n ) {
        /* dyoo: added so that if a keyword ends up in a prefix
           of another, we still mark that as a match.*/
        aho_corasick_output(s) = j;
        return 0;
    }

    while( j != n )
    {
        // Create new state
        if ( (s = xalloc(sizeof(aho_corasick_state_t))) == NULL )
            return -1;
        s->id = g->newstate++;
        debug(printf("allocating state %d\n", s->id)); /* debug */ 
        s->depth = state->depth + 1;

        /* FIXME: check the error return value of
           aho_corasick_goto_initialize. */
        aho_corasick_goto_initialize(s);

        // Create transition
        aho_corasick_goto_set(state,*(string+j), s);
        debug(printf("%u -> %c -> %u\n",state->id,*(string+j),s->id));
        state = s;
        aho_corasick_output(s) = 0;
        aho_corasick_fail(s) = NULL;
        ++j;                                 // <--- HERE!
    }

    aho_corasick_output(s) = n;

    return 0;
}
+2  A: 

There are other tools you can use that will find faults that does not necessarily crash the program. valgrind, electric fence, purify, coverity, and lint-like tools may be able to help you.

You might need to build your own python in some cases for this to be usable. Also, for memory corruption things, there is (or was, haven't built exetensions in a while) a possibility to let python use direct memory allocation instead of pythons own.

Mattias Nilsson
A: 

Have you tried translating that while loop to a for loop? Maybe there's some subtle misunderstanding with the ++j that will disappear if you use something more intuitive.

Chris