tags:

views:

801

answers:

4

Dear all,,

I want to read line-by-line from a given input file,, process each line (i.e. its words) and then move on to other line...

So i am using fscanf(fptr,"%s",words) to read the word and it should stop once it encounters end of line...

but this is not possible in fscanf, i guess... so please tell me the way as to what to do...

I should read all the words in the given line (i.e. end of line should be encountered) to terminate and then move on to other line, and repeat the same process..

+9  A: 

Use fgets(). Yeah, link is to cplusplus, but it originates from c stdio.h.

You may also use sscanf() to read words from string, or just strtok() to separate them.


In response to comment: this behavior of fgets() (leaving \n in the string) allows you to determine if the actual end-of-line was encountered. Note, that fgets() may also read only part of the line from file if supplied buffer is not large enough. In your case - just check for \n in the end and remove it, if you don't need it. Something like this:

// actually you'll get str contents from fgets()
char str[MAX_LEN] = "hello there\n";
size_t len = strlen(str);
if (len && str[len-1] == '\n') {
    str[len-1] = 0;
}

Simple as that.

dragonfly
But if i print the line using printf("-%s-",line),,, it prints '\n' as well as a new line character when i read a line using fgets?? what is this, and how to resolve this..
AGeek
@Young: Answered to your comment in the answer body.
dragonfly
+1 fgets is prefered to fscanf because of the better control since input can be anything so its easy to get a buffer overrun in fscanf.
Anders K.
str points to a read-only string (because you initialized it with a litteral), so you it should be qualified as const. strlen()'s return type is size_t so I think len should also be a size_t. It isn't a problem here, but it makes the code more generic. I think it's more explicit to call strchr().
Bastien Léonard
Similar to my answer and already upvoted, so I removed mine.
DevSolar
@Bastien Léonard: thanks for your remarks. I would agree on size_t (updated the answer). const is not needed since string is not actually read-only, initialization is just a sample. The purpose of this code is to demonstrate "chomping" '\n'. What do you mean by "it's more explicit to call strchr()"?
dragonfly
dragonfly: I thought that str was initialized like char* str = "hello there\n", where "hello there\n" yields a const char*. See http://c-faq.com/charstring/strlitinit.html.
Bastien Léonard
dragonfly: You want to search the first occurrence of '\n', right? strchr() does exactly that. When I see this strlen() trick, I always need some time to figure out how it works.
Bastien Léonard
@Bastien Léonard: Oh, I see. I would agree that strchr() seems more explicit, but I'm so used to strlen trick :) Moreover, using strlen() shows we know exactly that '\n' must be the last symbol if it is present.
dragonfly
+2  A: 

Given the buffering inherent in all the stdio functions, I would be tempted to read the stream character by character with getc(). A simple finite state machine can identify word boundaries, and line boundaries if needed. An advantage is the complete lack of buffers to overflow, aside from whatever buffer you collect the current word in if your further processing requires it.

You might want to do a quick benchmark comparing the time required to read a large file completely with getc() vs. fgets()...

If an outside constraint requires that the file really be read a line at a time (for instance, if you need to handle line-oriented input from a tty) then fgets() probably is your friend as other answers point out, but even then the getc() approach may be acceptable as long as the input stream is running in line-buffered mode which is common for stdin if stdin is on a tty.

Edit: To have control over the buffer on the input stream, you might need to call setbuf() or setvbuf() to force it to a buffered mode. If the input stream ends up unbuffered, then using an explicit buffer of some form will always be faster than getc() on a raw stream.

Best performance would probably use a buffer related to your disk I/O, at least two disk blocks in size and probably a lot more than that. Often, even that performance can be beat by arranging the input to be a memory mapped file and relying on the kernel's paging to read and fill the buffer as you process the file as if it were one giant string.

Regardless of the choice, if performance is going to matter then you will want to benchmark several approaches and pick the one that works best in your platform. And even then, the simplest expression of your problem may still be the best overall answer if it gets written, debugged and used.

RBerteig
Buffering is often a good thing. Using getc() instead of fgets() may be *much* slower.
dragonfly
It may not be as much slower as you think because input streams are usually already buffered by the standard library.
RBerteig
+1  A: 

but this is not possible in fscanf,

It is, with a bit of wickedness ;)

Update: More clarification on evilness

but unfortunately a bit wrong. I assume [^\n]%*[^\n] should read [^\n]%*. Moreover, one should note that this approach will strip whitespaces from the lines. – dragonfly

Note that xstr(MAXLINE) [^\n] reads MAXLINE characters which can be anything except the newline character (i.e. \n). The second part of the specifier i.e. *[^\n] rejects anything (that's why the * character is there) if the line has more than MAXLINE characters upto but NOT including the newline character. The newline character tells scanf to stop matching. What if we did as dragonfly suggested? The only problem is scanf will not know where to stop and will keep suppressing assignment until the next newline is hit (which is another match for the first part). Hence you will trail by one line of input when reporting.

What if you wanted to read in a loop? A little modification is required. We need to add a getchar() to consume the unmatched newline. Here's the code:

#include <stdio.h>

#define MAXLINE 255

/* stringify macros: these work only in pairs, so keep both */
#define str(x) #x
#define xstr(x) str(x)

int main() {
    char line[ MAXLINE + 1 ];
    /* 
       Wickedness explained: we read from `stdin` to `line`.
       The format specifier is the only tricky part: We don't
       bite off more than we can chew -- hence the specification 
       of maximum number of chars i.e. MAXLINE. However, this
       width has to go into a string, so we stringify it using  
       macros. The careful reader will observe that once we have
       read MAXLINE characters we discard the rest upto and
       including a newline.
     */
    int n = fscanf(stdin, "%" xstr(MAXLINE) "[^\n]%*[^\n]", line);
    if (!feof(stdin)) {
        getchar();
    }
    while (n == 1) {
        printf("[line:] %s\n", line);
        n = fscanf(stdin, "%" xstr(MAXLINE) "[^\n]%*[^\n]", line);
        if (!feof(stdin)) {
            getchar();
        }
    } 
    return 0;
}
dirkgently
Eep. Wicked indeed!
RBerteig
Please can u explain this programm...
AGeek
@Young: I have added comments. Can you point out which part(s) you don't understand? I can then edit my post with appropriate information. Cheers!
dirkgently
+1, Very nice wickedness, but unfortunately a bit wrong. I assume "[^\n]%*[^\n]" should read "[^\n]%*". Moreover, one should note that this approach will strip whitespaces from the lines.
dragonfly
@dragonfly: No. I strongly suggest that you run this code. I'll edit my post to clarify.
dirkgently
@dirkgently: I actually ran the code. I wrapped it into a cycle to read all lines from the file. In such case your code outputs only the first line. Mine works if compiled under `cl`, but behaves same if compiled with `gcc`. So your code is really an evilness :)
dragonfly
@dragonfly: You need a getchar() for the newline.
dirkgently
@dirkgently: Yep, you're right. It looks nice now. Though I would change cycle condition to !feof(stdin) since n will get zeroed on empty lines (and the previous ones will get doubled in the output).
dragonfly
@dragonfly: That is because we do not zero out the buffer before every read. Surely you can do that.
dirkgently
+3  A: 

If you are working on a system with the GNU extensions available there is something called getline (man 3 getline) which allows you to read a file on a line by line basis, while getline will allocate extra memory for you if needed. The manpage contains an example which I modified to split the line using strtok (man 3 strtrok).

#include <stdio.h>
#include <stdlib.h>

int main(void)
{
    FILE * fp;
    char * line = NULL;
    size_t len = 0;
    ssize_t read;

    fp = fopen("/etc/motd", "r");
    if (fp == NULL)
    {
        printf("File open failed\n");
        return 0;
    }

    while ((read = getline(&line, &len, fp)) != -1) {
        // At this point we have a line held within 'line'
        printf("Line: %s", line);
        const char * delim = " \n";
        char * ptr; 
        ptr = (char * )strtok(line,delim);

        while(ptr != NULL)
        {
            printf("Word: %s\n",ptr);
            ptr = (char *) strtok(NULL,delim);
        }
    }

    if (line)
    {
        free(line);
    }
    return 0;
}
amo-ej1
Yep, not in the standard, but *much* safer than fgets().
dmckee