tags:

views:

108

answers:

3

I've come across a rather charming bug that I'm trying to wrap my head around.

I feel like I've seen this before, but this time I want to understand why this is happening.

I have:

int i;
int debug = 0;
for(i = 0; i < buf_end; i++) {
    do_some_buffer_work();
    if(something_is_true()) {
         do_something_important();
         printf("debug is %i, i is %i", debug++, i);
    }
}

printf("end\n");

and I get as output:

debug is 1, i is 55
debug is 2, i is 55

so there was one point where the loop was executed twice with the same value for i. nothing I do within the loop directly touches i. Furthermore I doubt this is traditional memory stomping as the value is always the same. I suspected that it was something causing the program counter to move around (since sometimes with bad linking you get similar bugs,) until I did the following test:

int i;
static int debug;
for(i = 0; i < buf_end; i++) {
    do_some_buffer_work();
    if(something_is_true()) {
         do_something_important();
         printf("debug is %i, i is %i\n", debug++, i++);
         printf("debug is %i, i is %i\n", debug++, i++);
         printf("debug is %i, i is %i\n", debug++, i++);
    }
}
printf("end\n");

and I got this interesting output:

debug is 0, i is 55
debug is 1, i is 56
debug is 2, i is 57
debug is 3, i is 55
debug is 4, i is 56
debug is 5, i is 57
end

So here it is obvious that two complete iterations with the same i have executed, but the debug variable was not affected. It would appear that the value is being cached and restored for some reason. I had a hunch and changed the debug variable to non-static and got this:

int i;
int debug = 0;
for(i = 0; i < buf_end; i++) {
    do_some_buffer_work();
    if(something_is_true()) {
         do_something_important();
         printf("debug is %i, i is %i\n", debug++, i++);
         printf("debug is %i, i is %i\n", debug++, i++);
         printf("debug is %i, i is %i\n", debug++, i++);
    }
}
printf("end\n");

and I got this interesting output:

debug is 0, i is 55
debug is 1, i is 56
debug is 2, i is 57
debug is 0, i is 55
debug is 1, i is 56
debug is 2, i is 57
end

So it looks like the variables that live on the stack are reset to the beginning of the 55th iteration.

I am sure that the bug is in one of these do_something_important() calls - which deal with buffer reading - but this bug has taken on a character of its own and I think I owe it some respect to find out more about its nature before I squash it. So please rather than trying to help me fix it let me know if you have some clue as to why it is happening. More specifically, what can be changed in the program state to 'reset' values like this?

Edit: I'm sorry if people are annoyed that I left out the functions. They are quite large and have references to other functions, but the main reason I left it out was because I don't care about fixing this problem; I want to know how I might recreate it in the simplest way possible.

2nd Edit: Here's the immediate function where the blues happen. Including the referenced function and all subfunctions and definitions is probably around 500 lines so I'm not doing that here.

static int find_headers_search(FCALParseContext *fpc, uint8_t *buf, int buf_size,
                               int search_start)

{
    FCALFrameInfo fi;
    int end_offset = -1, size = 0, i;
    uint8_t *header_buf;

    int debug = 0;
    for (i = 0; i < buf_size - 1; i++) {
        if ((AV_RB16(buf + i) & 0xFFFE) == 0xFFF8) {
                av_log(NULL,AV_LOG_DEBUG,"predebug%i i %i\n",debug, i);
            header_buf = fcal_fifo_read_wrap(fpc, search_start + i,
                                             MAX_FRAME_HEADER_SIZE,
                                             &fpc->wrap_buf,
                                             &fpc->wrap_buf_allocated_size);

            if (frame_header_is_valid(header_buf, &fi)) {
                av_log(NULL,AV_LOG_DEBUG,"frame num %u bufstart %u, size %u, end %u i %i\n", (unsigned int)fi.frame_or_sample_num,
                       search_start, buf_size, search_start + buf_size -1, i);
                FCALHeaderMarker **end_handle = &fpc->headers;

                size = 0;
                while (*end_handle) {
                    end_offset =  (*end_handle)->offset;
                    end_handle = &(*end_handle)->next;
                    size++;
                }

                *end_handle = av_mallocz(sizeof(FCALHeaderMarker));
                if (!*end_handle) {
                    av_log(fpc->avctx, AV_LOG_ERROR,
                           "couldn't allocate FCALHeaderMarker\n");
                    return AVERROR(ENOMEM);
                }
                (*end_handle)->fi     = fi;
                (*end_handle)->offset = search_start + i;
                /* The actual size of the linked list is now size + 1 */
                update_sequences(fpc, size - FCAL_MAX_SEQUENTIAL_HEADERS,
                                 FFMIN(size, FCAL_MAX_SEQUENTIAL_HEADERS),
                                 *end_handle);
                fpc->nb_headers_found++;
                size++;
                av_log(NULL,AV_LOG_DEBUG,"debug%i i %i\n",debug++, i++);
                size = 0;
                while (*end_handle) {
                    end_offset =  (*end_handle)->offset;
                    end_handle = &(*end_handle)->next;
                    size++;
                }

                *end_handle = av_mallocz(sizeof(FCALHeaderMarker));
                if (!*end_handle) {
                    av_log(fpc->avctx, AV_LOG_ERROR,
                           "couldn't allocate FCALHeaderMarker\n");
                    return AVERROR(ENOMEM);
                }
                (*end_handle)->fi     = fi;
                (*end_handle)->offset = search_start + i;
                /* The actual size of the linked list is now size + 1 */
                update_sequences(fpc, size - FCAL_MAX_SEQUENTIAL_HEADERS,
                                 FFMIN(size, FCAL_MAX_SEQUENTIAL_HEADERS),
                                 *end_handle);
                fpc->nb_headers_found++;
                size++;
                av_log(NULL,AV_LOG_DEBUG,"debug%i i %i\n",debug++, i++);
                av_log(NULL,AV_LOG_DEBUG,"debug%i i %i\n",debug++, i++);
                av_log(NULL,AV_LOG_DEBUG,"debug%i i %i\n",debug++, i++);
            }
        }
    }
    return size;
}
+3  A: 

There's a buffer, there's a buf_end, there's some 'buffer work'. All invisible in the snippet. Clearly there's some invisible code that writes past the end of the buffer, stomping the local variables debug and i. Set a data breakpoint on the buffer end and you'll find it in a minute or two, usually.

Hans Passant
But, if it was stomping the local variables I wouldn't expect it to always end up being the same values at the start of the iteration.
fsmc
@fsmc: All anyone here can do is guess! Have you tried debugging yet?
Oli Charlesworth
Why do you assume that the stomping is not repeatable? Why do you assume it is not? These are silly questions, use the debugger.
Hans Passant
Running with different input gives different values of the initial printed i variable. The post bug result always goes back to the initial. I know this will not help me solve the bug, but I feel that it will teach me something about computers.
fsmc
The behaviour looks too deterministic to me for this hypothesis to be likely, but the code *is* heavily elided, and @fsmc should use a debugger, a far more efficient debug method that either printf or "ask a forum".
Clifford
Computers behave in-deterministic? They don't, the are machines. Only bugs do, produced by humans.
Hans Passant
@Hans: Indeed, and we are after all discussing a bug introduced by a human (and executed by a machine). I am not sure what your point is? The content of unitialised memory is non-determistic, as is the result of trashing the stack - especially under code changes (which is why adding debug code is ill-advised in preference to using a less intrusive debugger). I don't discount the hypothesis, merely do not believe it to be the most likley. Applying Occam's Rasor.
Clifford
@Clifford: The result of trashing the stack is not non-deterministic, so long as the function doing the trashing always does the same thing (e.g. if you forget to account for NULL terminator in size of a char array on the stack; `strcpy` will always trash something adjacent to the array with `'\0'`).
Oli Charlesworth
@Oli:Given @fsmc's subsequent comments and your observation about the first code fragment, I am moving closer to this possibility, but it seems that @fsmc's debug skills are not great, so I wonder if we'll ever know; as you say we can only guess! I was not suggesting that a buffer overrun might not produce identical results at each run, merely that *typically* all bets are off at determining what will happen. Often the return address is corrupted and the code will disappear into la-la land (technical term).
Clifford
@Clifford: Yes, if that happens, then you're absolutely right; pretty much anything could happen.
Oli Charlesworth
@Clifford: Also, like I said I ran valgrind with memcheck which if I'm not mistaken helps catch buffer overruns with a memory guard. I could be wrong - I'm not very good at using debuggers, as you say.
fsmc
@fsmc: If GDB is your debugger I can only sympathise ;)
Clifford
+2  A: 

It looks like recursion/reentrancy to me. The following hypothesis would explain the observed behaviour:

do_some_buffer_work() under some condition calls the function containing this code. This would explain the static being consistent and the locals restarting. The reason you only see it for certain values of i is because something_is_true() is only true for those values of i.

You should probably step the code in your debugger. Observing the call stack at a break-point on your printf() will quickly determine whether recursion has occurred.

Clifford
Reentrancy sounds like the right word. What can cause this to happen?
fsmc
The first code snippet would imply that it's not due to recursion; `debug` is incrementing.
Oli Charlesworth
@Oli: Good point; but given the brutal eliding of the code, I reckon all bets are off on the code posted being the necessarily the code executed!
Clifford
@fsmc: Recursive calls - a function calling itself, or a function calling a function above it in the call stack. Or possibly multi-threading, where a separate thread may call the same function. The symptoms do not see to suggest multi-threading as highly likely however.
Clifford
+1  A: 

This may or not be helpful (not knowing a thing about what do_some_buffer_work(), something_is_true(), or do_something_important() do kind hampers analysis a bit). but may try having 2 loop counter variables in different places and check for when they differ:

int i;
static int ii;

int debug = 0;
for(i = 0, ii = 0; i < buf_end; i++, ii++) {
    if (i != ii) debugBreak();

    do_some_buffer_work();
    if (i != ii) debugBreak();

    if(something_is_true()) {
        if (i != ii) debugBreak();
         do_something_important();
         printf("debug is %i, i is %i", debug++, i);
    }

    if (i != ii) debugBreak();
}

This might give you closer context to when things go bad.

Michael Burr