tags:

views:

103

answers:

2

Hi,

I don't understand why atoi() is working for every entry but the first one. I have the following code to parse a simple .csv file:

void ioReadSampleDataUsers(SocialNetwork *social, char *file) {
    FILE *fp = fopen(file, "r");

    if(!fp) {
        perror("fopen");
        exit(EXIT_FAILURE);
    }

    char line[BUFSIZ], *word, *buffer, name[30], address[35];
    int ssn = 0, arg;

    while(fgets(line, BUFSIZ, fp)) {
        line[strlen(line) - 2] = '\0';

        buffer = line;
        arg = 1;

        do {
            word = strsep(&buffer, ";");

            if(word) {
                switch(arg) {
                    case 1:
                        printf("[%s] - (%d)\n", word, atoi(word));
                        ssn = atoi(word);
                        break;
                    case 2:
                        strcpy(name, word);
                        break;
                    case 3:
                        strcpy(address, word);
                        break;
                }

                arg++;
            }
        } while(word);

        userInsert(social, name, address, ssn);
    }

    fclose(fp);
}

And the .csv sample file is this:

900011000;Jon Yang;3761 N. 14th St
900011001;Eugene Huang;2243 W St.
900011002;Ruben Torres;5844 Linden Land
900011003;Christy Zhu;1825 Village Pl.
900011004;Elizabeth Johnson;7553 Harness Circle

But this is the output:

[900011000] - (0)
[900011001] - (900011001)
[900011002] - (900011002)
[900011003] - (900011003)
[900011004] - (900011004)

What am I doing wrong?

+4  A: 

I'd guess that your CSV file was saved in UTF-8 format and has a BOM (byte order mark) at the beginning which is confusing atoi. You can verify this by looking at the file in a hex editor, or looking at the first few bytes of word.

A BOM for UTF-8 is three bytes with the values 0xEF, 0xBB, 0xBF.

If possible, save the file as ASCII. If not, add code to detect and skip these bytes.

interjay
Especially likely once you consider the results of strlen().
sharth
Just saved the file as ANSI and it solves it. I don't think that input needs to be saved in UTF-8.
Nazgulled
I think you mean utf-16. If it were utf8 then it would be ascii when limited byte values 0-127 and would not require a byte ordering code.
nategoose
@nategoose: No, I mean UTF-8. If it was UTF-16 then none of the values would be displayed correctly. And just because all the values are 0-127 doesn't mean the BOM won't get added, it depends on the program creating the file (e.g. Windows Notepad always saves the BOM when saving UTF-8).
interjay
+2  A: 

My guess is that the file starts with a byte order mark. atoi() sees it as non-digits, so returns 0.

if (line[0] == 0xEF && line[1] == 0xBB && line[2] == 0xBF) {
    /* byte order mark is present, so skip it somehow */
}
Dave Hinton