views:

426

answers:

3

I have some ANSI C code that I developed on my Mac, but when I tried running it on our school's Linux servers I get a segfault.

The specific line that is causing me trouble is a getc from a file pointer.

The file does exist.

Here is the method in question:

// inits lists with all data in fp file pointer
// returns # of lines read
int init_intlists(FILE *fp, INTLIST *lists[]) {
    int c, ctr;

    ctr = 0;

    // need to use a linked list to store current number
    // for non 1-digit numbers...
    INTLIST *cur_num = NULL;
    int cur_num_len = 0;
    while ((c = getc(fp)) != EOF){
        if(c != '\n' && c != ' '){
            c = c - 48;
            if(cur_num == NULL){
                cur_num = init_intlist(c);
            } else {
                list_append(cur_num, &c);
            }
            cur_num_len++;
        } else if(c == ' ' || c == '\n'){
            // we reached a space, meaning we finished
            // reading a contiguous block of digits
            // now we need to figure out what we actually read...
            int num = 0;
            INTLIST *ptr;
            ptr = cur_num;
            while(cur_num_len != 0){
                cur_num_len--;
                num += pow(10, cur_num_len) * ptr->datum;
                ptr = ptr->next;
            }    

            if(lists[ctr] == NULL){
                // init new list
                lists[ctr] = init_intlist(num);
            } else {
                // append to existing
                list_append(lists[ctr], &num);
            }

            // clear cur_num to read the next one
            cur_num_len = 0;
            list_delete(cur_num);
            cur_num = NULL;
        }

        if(c == '\n') {
            // newline reached - increment to fill in next list
            ctr++;
        }
    }    

    return ctr;
}

The call to init_intlists that causes the segfault starts thusly:

    FILE *fp = (FILE *)malloc(sizeof(FILE));
    FILE *base_vector_fp = (FILE *)malloc(sizeof(FILE));

    parse_args(argc, argv, fp, base_vector_fp);

    if(fp == NULL || base_vector_fp == NULL){
        fprintf(stderr, "Critical error, could not load input files\n");
        return 1;
    }

    INTLIST *lines[MAX_LINES] = {};
    INTLIST *base_vectors[MAX_LINES] = {};

    int lines_read = init_intlists(fp, lines);

and parse_args looks like:

FILE *load_file(char *filename) {
    FILE *fp;

    fp = fopen(filename, "r");

    if(fp == NULL){
        fprintf(stderr, "File %s does not seem to exist.\n", filename);
        return NULL;
    }

    // XXX Does this memory leak?
    // fp is never fclose()'d
    return fp;
}

void parse_args(int argc, char *argv[], FILE *fp, FILE *base_vector_fp) {
    char *prog = argv[0];
    if (argc != 3){
        fprintf(stderr, "Wrong number of arguments supplied.\nUse: %s <data_filename>     <base_vector_filename>\n", prog);
        free(fp);
        free(base_vector_fp);
        fp = NULL;
        base_vector_fp = NULL;
        exit(1);
    }

    char *filename = argv[1];
    *fp = *load_file(filename);

    char *base_vector_filename = argv[2];
    *base_vector_fp = *load_file(base_vector_filename);
}

So when I try invoking this on my Mac, it works perfectly fine and it reads the file in like it should and I'm able to operate on it and get the correct answers for my assignment.

However, when I try running it on Linux, I get a segfault when it tries to getc in the init_intlists subroutine.

I have verified that the files I supply for input exist and are world-readable(umask 755). I have tried with both absolute and relative paths. I have tried several different input files as well.

I have tried using gcc 4.2 and gcc 3.4 on the Linux server and both produce a binary executable that will cause a segfault with any given input files.

Here is the version information between the two different versions of gcc:

Mac OS X:

me@dinosaurhunter ~> gcc -v
Using built-in specs.
Target: i686-apple-darwin9
Configured with: /var/tmp/gcc/gcc-5465~16/src/configure --disable-checking -enable-werror --prefix=/usr --mandir=/share/man --enable-languages=c,objc,c++,obj-c++ --program-transform-name=/^[cg][^.-]*$/s/$/-4.0/ --with-gxx-include-dir=/include/c++/4.0.0 --with-slibdir=/usr/lib --build=i686-apple-darwin9 --with-arch=apple --with-tune=generic --host=i686-apple-darwin9 --target=i686-apple-darwin9
Thread model: posix
gcc version 4.0.1 (Apple Inc. build 5465)

Linux:

me@janus:~/assignment_1$ gcc -v
Using built-in specs.
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --enable-languages=c,c++,fortran,objc,obj-c++,treelang --prefix=/usr --enable-shared --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --enable-nls --with-gxx-include-dir=/usr/include/c++/4.2 --program-suffix=-4.2 --enable-clocale=gnu --enable-libstdcxx-debug --enable-objc-gc --enable-mpfr --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 4.2.4 (Ubuntu 4.2.4-1ubuntu4)

I invoke the compiler using the same Makefile on both OS X and Linux. The end invocation of gcc winds up looking like this:

gcc  -Wall -g  -c src/common_file_io.c src/main.c src/intlist.c
gcc  -Wall -g  common_file_io.o main.o intlist.o -lreadline -lm  -o bin/myprogram

Any ideas? I am at a complete loss, as is my professor.

+10  A: 

You aren't supposed to allocate your own FILE objects, they are generally opaque objects managed by libc. Don't free() them either, that's done by fclose(3). Although theoretically you could allocate one and do a structure assign and have it work, it would be better not to fight the library and just pass around the reference like everyone else does. The library may or may not keep state that isn't in the FILE structure, and peeking inside or dereferencing the entire struct is sufficiently bad style that the implementors may actually assume you never do it.

If you want to return a FILE * you can either use it as a return pointer value as you did in one case or use a double-indirect pointer: FILE *fp; f(&fp);.

Hmm, I just noticed that C99 actually specifies this in 7.19.13:

6 The address of the FILE object used to control a stream may be significant; a copy of a FILE object need not serve in place of the original.

With this they are serving notice that a FILE * may really just be a magic cookie.

DigitalRoss
To be more specific, you never work with `FILE` - you only work with `FILE*`, and you only get valid values for that from a call to one of `fopen`, `freopen`, or `tmpfile`.
Pavel Minaev
Strange that his professor didn't notice?
Carl Norum
Good answer! Note that for clarification to the OP, you might specifically note that the dubious malloc'ing of file pointers might be the cause of the segfault.
csl
"Although theoretically you could allocate one and do a structure assign and have it work" - I'm not sure that's even guaranteed to be doable. Is the pointer guaranteed to be a valid pointer to `FILE*` (and not just a handle that's casted back and forth by library)? Even if it is a valid pointer, is there a guarantee that `FILE` is a type that is fully defined (and not just `typedef struct FILE FILE`)?
Pavel Minaev
@PM: C99 appears to stop just short of answering those questions, but I certainly agree with you; it's a bad idea to do anything other than OO-style ops with the reference.
DigitalRoss
A: 

You should not copy the result offopen()into aFILEobject, and in fact, you should notmallocaFILEobject at all. You should always usefopen()to allocate theFILEcontrol object.

TheFILEobject is opaque, and verily it contains much that is hidden to mere mortals. The implementation is free to put all sorts of things in it, such as pointers to other control structures, etc.

Loadmaster
+1  A: 

The other answers are correct - treat a FILE * as an opaque handle that you copy around, don't try to copy its contents. Specifically you can fix your code as follows:

Remove the call to malloc when you initialise fp and base_vector_fp:

FILE *fp = NULL;
FILE *base_vector_fp = NULL;

Pass a pointer to these pointers to parse_args, so that it can update the pointer values:

parse_args(argc, argv, &fp, &base_vector_fp);

And change parse_args to update the FILE * objects in the caller, rather than try and work with the FILE objects:

void parse_args(int argc, char *argv[], FILE **fp, FILE **base_vector_fp) {
    char *prog = argv[0];
    if (argc != 3){
        fprintf(stderr, "Wrong number of arguments supplied.\nUse: %s <data_filename>     <base_vector_filename>\n", prog);
        exit(1);
    }

    char *filename = argv[1];
    *fp = load_file(filename);

    char *base_vector_filename = argv[2];
    *base_vector_fp = load_file(base_vector_filename);
}
caf
Dur, some reason pointers-to-pointers weren't readily coming to my mind as a solution to this problem. I knew I shouldn't manually stick things in the contents of the FILE object but was having trouble figuring out the correct idiom.
ashgromnies