tags:

views:

72

answers:

2

Hello. I am parsing a text (css) file using fscanf. The basic goal is simple; I want to pull out anything that matches this pattern:

@import "some/file/somewhere.css";

So I'm using fscanf, telling it to read and discard everything up to a '@' character and then store everything until it reaches a ';' character. Here's the function that does this:

char* readDelimitedSectionAsChar(FILE *file)
{
char buffer[4096];

int charsRead;
do
{
    fscanf(file, "%*[^@] %[^;]", buffer, &charsRead);

} while(charsRead == 4095);

char *ptr = buffer;
return ptr;
}

I've created a buffer that should be able to hold 4095 characters, as I understand it. However, I'm discovering that this is not the case. If I have a file that contains a matching string that's long, like this:

@import "some/really/really/really/long/file/path/to/a/file";

That gets truncated to 31 characters using a buffer of char[4096]. (If I use printf to check the value of buffer, I find that the string is cut short.)

If I increase the buffer size, more of the string is included. I was under the impression that one character takes one byte (though I am aware this is affected by encoding). I am trying to understand what's going on here.

Ideally, I'd like to be able to set the buffer as large as it needs to be "on the fly" --- that is, have fscanf just create a buffer big enough to store the string. Can this be done? (I know of the %as flag for GNU, but this is a Mac application for OS 10.5/10.6 and I'm unsure if that will work on this platform.)

+1  A: 

Your buffer is local to the function. You assign a pointer to it, but when the caller accesses the pointer, buffer no longer exists. Anything can happen.

So, don't do that.

And scanf probably isn't the right tool for the job. I'd try getc, or fgets instead.

char *readDelimitedSectionAsChar(char *buf, size_t n, char firstChar, char lastChar, FILE *f);
pmg
+2  A: 

The main problem you have is that you're returning a pointer to a local buffer on the stack, which is dangling (and so overwritten by the next call you make). You also have a potential buffer overflow. You mention the 'a' option, which would help a lot, but its unfortunately a GNU extension which isn't generally available.

Second, you have this extra option to scanf, &charsRead which will never be written to as there's no % for it in the format string. So charsRead will always be random garbage -- which means you loop will (probably) just run once, or (rarely) loop forever. Try something like

char* readDelimitedSectionAsChar(FILE *file)
{
    char buffer[4096], term[2] = "", *rv = 0;
    int len = 0;

    fscanf(file, "%*[^@]");
    while (term[0] != ';' && !feof(file)) {
        if (fscanf(file, "%4095[^;]%1[;]", buffer, term) > 0) {
            int read = strlen(buffer);
            rv = rv ? realloc(rv, len+read+1) : malloc(read+1);
            strcpy(rv+len, buffer);
            len += read;
        }
    }
    return rv;
}

This is still broken in that it will misbehave if you run out of memory (which can easily happen if you feed it a huge malformed file with an @ in the beginning and no ;),

Chris Dodd