tags:

views:

52

answers:

1

I have a text file with data in the form:

Lee AUS 2 103 2 62 TRUE
Check AUS 4 48 0 23 FALSE
Mills AUS 8 236 0 69 FALSE

I need to each line into a struct like, however I'd like to avoid using fixed length arrays (the problem with fgets as far as I can tell):

struct Data
{
    char *sname;
    char *country;
    int *a;
    int *b;
    int *c;
    int *d;
    char *hsisno;
};

I am very new to C. Should I use fscanf, or fgets?

+2  A: 

fscanf stands for "file scan formatted" and user data is about as unformatted as you can get.

You should never use naked "%s" format strings on data where you don't have absolute control over what can be read.

The best solution is to use fgets to read a line since this allows you to prevent buffer overflow.

Then, once you know the size of your line, that's the maximum size of each string that you will require. Use sscanf to your heart's content to get the actual fields.

One final thing. It's probably a bit wasteful having int* types for the integers, since you know they have a specific maximum size already. I'd use the non-pointer variant, something like:

struct Data {
    char *sname; char *country;
    int a; int b; int c; int d;
    char *hsisno;
};

By way of example, here's some safe code:

#include <stdio.h>
#include <string.h>

// Here's all the stuff for a linked list of your nodes.

typedef struct sData {
    char *sname; char *country; char *hsisno;
    int a; int b; int c; int d;
    struct sData *next;
} Data;
Data *first = NULL; Data *last = NULL;

#define MAXSZ 100
int main (void) {
    char line[MAXSZ], sname[MAXSZ], country[MAXSZ], hsisno[MAXSZ];
    int a, b, c, d;
    FILE *fIn;
    Data *node;

    // Open the input file.

    fIn = fopen ("file.in", "r");
    if (fIn == NULL) {
        printf ("Cannot open file\n");
        return 1;
    }

    // Process every line.

    while (fgets (line, sizeof(line), fIn) != NULL) {
        // Check line for various problems (too short, too long).

        if (line[0] == '\0') {
            printf ("Line too short\n");
            return 1;
        }

        if (line[strlen (line)-1] != '\n') {
            printf ("Line starting with '%s' is too long\n", line);
            return 1;
        }

        line[strlen (line)-1] = '\0';

        // Scan the individual fields.

        if (sscanf (line, "%s %s %d %d %d %d %s",
            sname, country, &a, &b, &c, &d, hsisno) != 7)
        {
            printf ("Line '%s' didn't scan properly\n", line);
            return 1;
        }

        // Allocate a new node to hold data.

        node = malloc (sizeof (Data));
        if (node == NULL) {
            printf ("Ran out of memory\n");
            return 1;
        }

        node->sname = strdup (sname);
        node->country = strdup (country);
        node->a = a;
        node->b = b;
        node->c = c;
        node->d = d;
        node->hsisno = strdup (hsisno);
        node->next = NULL;
        if (first != NULL) {
            last->next = node;
            last = node;
        } else {
            first = node;
            last = node;
        }
    }

    fclose (fIn);

    // Output the list for debugging.

    node = first;
    while (node != NULL) {
        printf ("'%s' '%s' %d %d %d %d '%s'\n",
            node->sname, node->country, node->a, node->b,
            node->c, node->d, node->hsisno);
        node = node->next;
    }

    return 0;
}

which reads in your file and stores it in a linked list. It outputs:

'Lee' 'AUS' 2 103 2 62 'TRUE'
'Check' 'AUS' 4 48 0 23 'FALSE'
'Mills' 'AUS' 8 236 0 69 'FALSE'

at the end, as expected.


I've done a whole series of answers on the pitfalls of using *scanf functions on non-controlled data (enter user:14860 fgets into the search box above), some of which (here, here and here, for example) include a perennial favourite function of mine, getLine, for safer user input.

paxdiablo
Thank you. But doesn't fgets require you to know the size of your line, as it needs the buffer size parameter?
Dominic Bou-Samra
Usually people use put a size larger than they'd ever expect to occur in the file.
GWW
Okay, thats what I was going to do, but figured that's a pretty bad code smell.
Dominic Bou-Samra
Not as bad as having your code crash :-) In any case, you can use a dynamic buffer as well. One trick I've used before is to save the file pointer before reading the line and, if the line was too big (fgets returns a buffer with no newline), keep reading characters until I find the newline then dynamically allocate enough memory for that, back up the file pointer and try again.
paxdiablo
Epic answer thank you paxdiablo!
Dominic Bou-Samra
Epic answers are my speciality. What other forum (other than running my own blog, but that's too much trouble) would allow me to spew forth such voluminous essays without people becoming bored and just wandering off in the middle? :-)
paxdiablo
Adopt schizophrenia. I assume you won't get bored of talking to yourself ;)
Dominic Bou-Samra
It should be noted that using `fgets` does not require imposing an arbitrary line length limit. Before calling `fgets(s,n,f)`, set `s[n-2]='\n'` and then checking if `s[n-2]!='\n'` after the call will tell you if you ran out of space. You can then enlarge the buffer and attempt to read more of the line into it. There are other conceptually simpler checks but they may require iterating over the whole string/checking its length. My check is `O(1)` and fast.
R..