tags:

views:

233

answers:

6

My C++ program needs to know how many lines are in a certain text file. I could do it with getline() and a while-loop, but is there a better way?

+1  A: 

Iterate the file char-by-char with get(), and for each newline (\n) increment line number by one.

reko_t
That method is worse than the one I was talking about. I'm trying to avoid reading the whole file in.
Phenom
this is way too slow
knittl
@knittl: how do you know ? ever heard of premature optimisation ?
Paul R
@Phenom: no - the char-by-char method and getline method do exactly the same thing - they read the entire file looking for end of line characters
Paul R
It'll be faster than `getline()`.The fastest way would be to `mmap()` the file and then count `\n`s.
Andrew McGregor
@Phenom RE: "avoid reading the whole file in" - unless you've preprocessed or otherwise have some index of these files, you will have to read all of the contents of the file. You won't necessarily have to have the entire file in memory, but at some point you will read every byte of the file.
Chris Schmich
`getline()` is more readable
knittl
@knittl: note that `getline()` may fail if you have excessively long lines - you will need extra code to handle this case, so the getchar approach may actually be more readable
Paul R
+4  A: 

No.

Not unless your operating system's filesystem keeps track of the number of lines, which your system almost certainly doesn't as it's been a looong time since I've seen that.

msw
VMS is the only operating system I know of that does this - it treats each line of a text file as a "record"
Paul R
I was wondering if some file systems actually do/did that. Nice to know.
peterchen
+2  A: 

By "another way", do you mean a faster way? No matter what, you'll need to read in the entire contents of the file. Reading in different-sized chunks shouldn't matter much since the OS or the underlying file libraries (or both) are buffering the file contents.

getline could be problematic if there are only a few lines in a very large file (high transient memory usage), so you might want to read in fixed-size 4KB chunks and process them one-by-one.

Chris Schmich
+1  A: 

The fastest, but OS-dependent way would be to map the whole file to memory (if not possible to map the whole file at once - map it in chunks sequentially) and call std::count(mem_map_begin,mem_map_end,'\n')

catwalk
Which OSes can do this?
Phenom
links for most common: unix: http://linux.die.net/man/2/mmapwindows: http://msdn.microsoft.com/en-us/library/aa366556(VS.85).aspx
catwalk
Why do you think this would be faster than `getline`?
ChrisW
@ChrisW: `mmap` is faster than `getline` because it usually avoids extra buffering on standard library level and data movement between kernel and user levels. `getline` could be made more efficient but I think that its designers went for more generic and portable approach rather than for pure speed.
catwalk
A: 

Don't know if getline() is the best - buffer size is variable at the worst case (sequence of \n) it could read byte after byte in each iteration.

For me It would be better to read a file in a chunks of predetermined size. And than scan for number of new line encodings ( inside. Although there's some risk I cannot / don't know how to resolve: other file encodings than ASCII. If getline() will handle than it's easiest but I don't think it's true.

Some url's:

http://stackoverflow.com/questions/1509277/why-does-wide-file-stream-in-c-narrow-written-data-by-default/

http://en.wikipedia.org/wiki/Newline

XAder
A: 

possibly fastest way is to use low level read() and scan buffer for '\n':

int clines(const char* fname)
{
    int nfd, nLen;
    int count = 0;
    char buf[BUFSIZ+1];

    if((nfd = open(fname, O_RDONLY)) < 0) {
        return -1;
    }

    while( (nLen = read(nfd, buf, BUFSIZ)) > 0 )
    {
        char *p = buf;
        int n = nLen;
        while( n && (p = memchr(p,'\n', n)) ) {
            p++;
            n = nLen - (p - buf);
            count++;
        }
    }
    close(nfd);
    return count;
}
oraz