views:

2653

answers:

5

What's the most compact way to compute the number of lines of a file? I need this information to create/initialize a matrix data structure.

Later I have to go through the file again and store the information inside a matrix.

Update: Based on Dave Gamble's. But why this doesn't compile? Note that the file could be very large. So I try to avoid using container to save memory.

#include <iostream>      
#include <vector>        
#include <fstream>       
#include <sstream>       
using namespace std;     


int main  ( int arg_count, char *arg_vec[] ) {
    if (arg_count !=2 ) {
        cerr << "expected one argument" << endl;
        return EXIT_FAILURE;      
    }

    string line;
    ifstream myfile (arg_vec[1]);

    FILE *f=fopen(myfile,"rb");
    int c=0,b;
    while ((b=fgetc(f))!=EOF) c+=(b==10)?1:0;
    fseek(f,0,SEEK_SET);


    return 0;
}
+1  A: 

Count the number of instances of '\n'. This works for *nix (\n) and DOS/Windows (\r\n) line endings, but not for old-skool Mac (System 9 or maybe before that), which used just \r. I've never seen a case come up with just \r as line endings, so I wouldn't worry about it unless you know it's going to be an issue.

Edit: If your input is not ASCII, then you could run into encoding problems as well. What's your input look like?

280Z28
That might not be cross-platform (I'm just saying).
Lucas McCoy
@280Z28: How do you do that?
neversaint
+4  A: 
FILE *f=fopen(filename,"rb");

int c=0,b;while ((b=fgetc(f))!=EOF) c+=(b==10)?1:0;fseek(f,0,SEEK_SET);

Answer in c. That kind of compact?

Dave Gamble
Ew. [Blah blah 15 characters]
GMan
Be grateful I left in the ?1:0. It works fine without. I couldda saved 4 chars there ;)
Dave Gamble
`int c=0;while(!fscanf(f,"%*[^\n]%*c"))c++;fseek(f,0,SEEK_SET)`
Adam Rosenfield
That's magical. I bow down to the excellence of using fscanf like that.
Dave Gamble
@DG: please advice based on my update.
neversaint
D'oh, apparently my solution fails if the input contains an empty line -- the `%*[^\n]` directive doesn't match any characters in that case, so it will loop infinitely. There's not an elegant way to fix that without bloating the char count significantly. =/
Adam Rosenfield
I never knew scanf accepted regex parameters... still an amazingly cool trick!
Dave Gamble
@Dave: would your solution be able to handle a blank lane? Also, what is the purpose of the fseek? I ran your code without it and it printed the correct line numbers.
Hristo
@Hristo yes, blank lines are just fine; we count the literal number of '\n' characters in the file. The fseek is a nicety to rewind to the start of the file, since the OP needs to then proceed through.Of course, do note that this is all rather tongue-in-cheek; the efficiency of loading individual bytes is generally rather dreadful compared to allocating some memory and reading large blocks from the file... but the OP asked for compactness.
Dave Gamble
+9  A: 

If the reason you need to "go back again" is because you cannot continue without the size, try re-ordering your setup.

That is, read through the file, storing each line in a std::vector<string> or something. Then you have the size, along with the lines in the file:

#include <fstream>
#include <iostream>
#include <string>
#include <vector>

int main(void)
{
    std::fstream file("main.cpp");
    std::vector<std::string> fileData;

    // read in each line
    std::string dummy;
    while (getline(file, dummy))
    {
     fileData.push_back(dummy);
    }

    // and size is available, along with the file
    // being in memory (faster than hard drive)
    size_t fileLines = fileData.size();

    std::cout << "Number of lines: " << fileLines << std::endl;
}


Here is a solution without the container:

#include <fstream>
#include <iostream>
#include <string>
#include <vector>

int main(void)
{
    std::fstream file("main.cpp");
    size_t fileLines = 0;    

    // read in each line
    std::string dummy;
    while (getline(file, dummy))
    {
     ++fileLines;
    }

    std::cout << "Number of lines: " << fileLines << std::endl;
}

Though I doubt that's the most efficient way. The benefit of this method was the ability to store the lines in memory as you went.

GMan
This is the method I was going to suggest, but it's been long enough since I used C++ that I was going to have to actually test it all out before posting. Thanks for saving me the time+1 :D
280Z28
I +REALLY+ love when I can tell someone's actually tested the code, by virtue of the fact that the name of the sourcefile is there as input :D:D
Dave Gamble
lol, yeah. I had that 'bingo!' moment when I thought, "I need a file to test with....oh duh"
GMan
If we just need to count, std::istream::ignore() should be more efficient than getline.
Luc Hermitte
+6  A: 

I think this might do it...

std::ifstream file(f);
int n = std::count(std::istreambuf_iterator<char>(file), std::istreambuf_iterator<char>(), '\n') + 1;
Evan Teran
+1 for knowing how to use streambufs.
quark
+3  A: 
#include <stdlib.h>
int main(void) { system("wc -l plainfile.txt"); }
Rodrigo