ansaurus

Question

Using C++ to find out how many lines are in a text file

Answer 1

+1 A:

Iterate the file char-by-char with get(), and for each newline (\n) increment line number by one.

reko_t 2010-05-10 07:40:44

That method is worse than the one I was talking about. I'm trying to avoid reading the whole file in.

Phenom 2010-05-10 07:41:34

this is way too slow

knittl 2010-05-10 07:41:58

@knittl: how do you know ? ever heard of premature optimisation ?

Paul R 2010-05-10 07:45:32

@Phenom: no - the char-by-char method and getline method do exactly the same thing - they read the entire file looking for end of line characters

Paul R 2010-05-10 07:46:47

It'll be faster than `getline()`.The fastest way would be to `mmap()` the file and then count `\n`s.

Andrew McGregor 2010-05-10 07:47:43

@Phenom RE: "avoid reading the whole file in" - unless you've preprocessed or otherwise have some index of these files, you will have to read all of the contents of the file. You won't necessarily have to have the entire file in memory, but at some point you will read every byte of the file.

Chris Schmich 2010-05-10 07:54:14

`getline()` is more readable

knittl 2010-05-10 08:08:24

@knittl: note that `getline()` may fail if you have excessively long lines - you will need extra code to handle this case, so the getchar approach may actually be more readable

Paul R 2010-05-11 08:09:49

Answer 2

+4 A:

No.

Not unless your operating system's filesystem keeps track of the number of lines, which your system almost certainly doesn't as it's been a looong time since I've seen that.

msw 2010-05-10 07:42:16

VMS is the only operating system I know of that does this - it treats each line of a text file as a "record"

Paul R 2010-05-10 07:44:47

I was wondering if some file systems actually do/did that. Nice to know.

peterchen 2010-05-10 08:42:02

Answer 3

+2 A:

By "another way", do you mean a faster way? No matter what, you'll need to read in the entire contents of the file. Reading in different-sized chunks shouldn't matter much since the OS or the underlying file libraries (or both) are buffering the file contents.

getline could be problematic if there are only a few lines in a very large file (high transient memory usage), so you might want to read in fixed-size 4KB chunks and process them one-by-one.

Chris Schmich 2010-05-10 07:47:25

Answer 4

+1 A:

The fastest, but OS-dependent way would be to map the whole file to memory (if not possible to map the whole file at once - map it in chunks sequentially) and call std::count(mem_map_begin,mem_map_end,'\n')

catwalk 2010-05-10 07:50:02

Which OSes can do this?

Phenom 2010-05-10 08:10:43

links for most common: unix: http://linux.die.net/man/2/mmapwindows: http://msdn.microsoft.com/en-us/library/aa366556(VS.85).aspx

catwalk 2010-05-10 08:25:03

Why do you think this would be faster than `getline`?

ChrisW 2010-05-10 08:40:34

@ChrisW: `mmap` is faster than `getline` because it usually avoids extra buffering on standard library level and data movement between kernel and user levels. `getline` could be made more efficient but I think that its designers went for more generic and portable approach rather than for pure speed.

catwalk 2010-05-10 13:01:22

Answer 5

A:

Don't know if getline() is the best - buffer size is variable at the worst case (sequence of \n) it could read byte after byte in each iteration.

For me It would be better to read a file in a chunks of predetermined size. And than scan for number of new line encodings ( inside. Although there's some risk I cannot / don't know how to resolve: other file encodings than ASCII. If getline() will handle than it's easiest but I don't think it's true.

Some url's:

http://stackoverflow.com/questions/1509277/why-does-wide-file-stream-in-c-narrow-written-data-by-default/

http://en.wikipedia.org/wiki/Newline

XAder 2010-05-10 08:35:35

Answer 6

A:

possibly fastest way is to use low level read() and scan buffer for '\n':

int clines(const char* fname)
{
    int nfd, nLen;
    int count = 0;
    char buf[BUFSIZ+1];

    if((nfd = open(fname, O_RDONLY)) < 0) {
        return -1;
    }

    while( (nLen = read(nfd, buf, BUFSIZ)) > 0 )
    {
        char *p = buf;
        int n = nLen;
        while( n && (p = memchr(p,'\n', n)) ) {
            p++;
            n = nLen - (p - buf);
            count++;
        }
    }
    close(nfd);
    return count;
}

oraz 2010-05-10 08:37:23

ansaurus

tags:

views:

answers:

Using C++ to find out how many lines are in a text file

related questions