tags:

views:

153

answers:

4

when running the following code, the amount of lines will read on less then there actually is (if the input file is main itself, or otherwise) why is this and how can i change that fact (besides for just adding 1)?

#include <fstream>
#include <iostream>
#include <string>
#include <vector>

using namespace std;

int main()
{
    // open text file for input
    string file_name;

    cout << "please enter file name: ";
    cin  >> file_name;

    // associate the input file stream with a text file
    ifstream infile(file_name.c_str());

    // error checking for a valid filename
    if ( !infile ) {
        cerr << "Unable to open file "
             << file_name << " -- quitting!\n";
        return( -1 );
        }
        else cout << "\n";

    // some data structures to perform the function
    vector<string> lines_of_text;
    string textline;

    // read in text file, line by line
    while (getline( infile, textline, '\n' ))   {
        // add the new element to the vector
        lines_of_text.push_back( textline );

        // print the 'back' vector element - see the STL documentation
        cout << "line read: " << lines_of_text.back() << "\n";
    }
cout<<lines_of_text.size();
    return 0;
}
A: 

Well, if the last line of your file is just '\n', you don't push it into the vector. If you want it to be there, change the loop to:

while (getline( infile, textline, '\n' ).gcount() > 0) 
{
    if (infile.fail()) break; //An error occurred, break or do something else

    // add the new element to the vector
    lines_of_text.push_back( textline );

    // print the 'back' vector element - see the STL documentation
    cout << "line read: " << lines_of_text.back() << "\n";
}

Use the gcount() member to check how many characters were read in the last read - this will return 1 if it only read a delimiter character.

David
That's right but I don't think your fix works at all.
Bus
@Bus: You're right, I've edited to fix that.
David
Why wouldn't a line with no text (other than the delimiting newline) be pushed to the vector?
wilhelmtell
If the last line is empty (e.g. the file ends with "\n\n"), then he *does* store it in the vector.
Roger Pate
A: 

Ok so here is an explanation that you will hopefully understand. Your code should work fine if the file we're talking about doesn't end with newline. But what if it does? Let's say it looks like this:

"line 1"
"line 2"
""

Or as a sequence of characters:

line 1\nline 2\n

This file has THREE lines -- the last one being empty but it's there. After calling getline twice, you've read all the characters from the file. The third call to getline will say oops, end of file, sorry no more characters so you'll see only two lines of text.

Bus
But of course some people may think that this particular example file only really has two lines. It's a matter of taste.
Bus
It's not a matter of taste. There are, believe it or not, standards on text files. That file has 2 lines.
Roger Pate
@Roger lol no, that file has three lines
wilhelmtell
most text editors will append a newline at the end of your file, so the file will look like `line 1\nline 2\n\n`
wilhelmtell
If files are guaranteed to have a suffix newline then the number of lines is always the number of newlines. Otherwise it's the number of newlines plus one.
wilhelmtell
@wil: The file content explicitly given in the above answer says `line 1\nline 2\n`, which has two lines. If it was `line 1\nline 2\n\n` instead, it would have three lines. Some editors show a "phantom" last line, but what they are really showing is an insertion point that, if you put text there, would become that line. (I prefer this convention in editors myself, but it's really relatively minor and I more often go without it.)
Roger Pate
@Roger so you're trying to tell me that some standard forbids me to have my own opinion about the number of lines in this file? I think that from user's point of view it's completely logical to think of this file as having three lines. (There may be industry standards but we're talking about a person trying to count the number of lines displayed on his screen. He doesn't care about standards which don't apply to him.)
Bus
@Roger how many lines would this file have: line 1\nline 2\nline 3 ?
Bus
@Bus: If he wants to count lines displayed by a particular editor, then he needs to determine how that editor displays text. That can be different from how the file is stored, and in many more ways than just this, such as folding.
Roger Pate
@Bus: Newline characters are line terminators, not line separators. The C++ standard, for example, explicitly requires the last line of a file to end with a newline. "If a source file that is not empty does not end in a new-line character .. the behavior is undefined." [2.1/1, C++03]
Roger Pate
@Roger I'm very much aware of this requirement in many applications and I totally agree that it's a good convention -- but we're not talking about C++ source files or any standardized file format here. The particular editor which ace uses counts the lines in the way I described and I was just trying to explain to him why his program outputs a different value. (And I personally think that counting lines this way is very logical -- shouldn't newline character tell us "hey, NEW LINE follows" rather than "line ENDS here"?)
Bus
@Roger this question was not about how should we count lines in our text files but why does this program output different line count than some particular editor.
Bus
@Bus: "Newline" is actually "line feed", which might help you see the "line ends here" meaning more clearly. The C++ requirement was just one example, and if he wants to define his own standard, then I can't stop him, of course. However, what I understand CodeBlocks to be doing (what you showed in your answer) is still consistent with established standards, using the meaning I mentioned to Wilhelm above.
Roger Pate
@Roger I wasn't mistaken, and you weren't mistaken either. @Bus posted two representatins of the file that either don't agree with each other or that the sequencial representation is invalid. Text files _should_ have a trailing newline, that's what editors expect. When I said three lines I refered to the representation without the `\n` symbols. That has an empty line at the end. A file with a single empty line, of course, should be `\n\n`: a single line.
wilhelmtell
@Roger I'm telling you what you already know and you're telling me what I already know. But hopefully this discussion has at least some educational value for ace or anyone else reading this :).
Bus
@Wilhelm The first representation shows how would the file look in ace's editor. The second representation is correct.
Bus
@wil: No, a file with a single empty line would be `\n`, while `\n\n` would be a file with two empty lines. What Bus showed is exactly how some editors show it, complete with a line number 3 (if line numbers are enabled), and still save it as `line 1\nline 2\n` (a two line file). This is why I pointed out those editors are showing an *insertion point* (at the end of the file) in an earlier comment.
Roger Pate
:s you're right. i don't know what was i smoking, but surely i should have slept last night.
wilhelmtell
+1  A: 

The code you have is sound. Here's a small test case that might help:

void read_lines(std::istream& input) {
  using namespace std;
  vector<string> lines;
  for (string line; getline(input, line);) {
    lines.push_back(line);
    cout << "read: " << lines.back() << '\n';
  }
  cout << "size: " << lines.size() << '\n';
}

int main() {
  {
    std::istringstream ss ("abc\n\n");
    read_lines(ss);
  }
  std::cout << "---\n";
  {
    std::istringstream ss ("abc\n123\n");
    read_lines(ss);
  }
  std::cout << "---\n";
  {
    std::istringstream ss ("abc\n123");  // last line missing newline
    read_lines(ss);
  }
  return 0;
}

Output:

read: abc
read: 
size: 2
---
read: abc
read: 123
size: 2
---
read: abc
read: 123
size: 2
Roger Pate
+3  A: 

I think I have tracked down the source of your problem. In Code::Blocks, a completely empty file will report that there is 1 line in it (the current one) in the gizmo on the status bar at the bottom of the IDE. This means that were you actually to enter a line of text, it would be line 1. In other words, Code::Blocks will normally over-report the number of actual lines in a file. You should never depend on CB, or any other IDE, to find out info on files - that's not what they are for.

anon
ok, cool. im just creating a line counter for a project so it was/is necessary
ace
Your code is thus less buggy than Code::Blocks'. Something to be proud of, I suppose.
wilhelmtell
@ace: Have you seen [wc](http://en.wikipedia.org/wiki/Wc_%28Unix%29)?
Roger Pate
@Wilhelm I don't think CB's code is buggy - it is reporting where the cursor is - it's on line 1.
anon
@Neil well then not buggy, misleading. :p
wilhelmtell
@Wilhelm Not misleading either - most text editors and word processors work that way (though not all, vim would report it as line 0). The problem is in the OP's lack of understanding.
anon
@Neil I find it misleading, excuse me. What does CB's line field say when the file is empty? That the cursor is on line 1? How can it possibly be on line 1 if _there is no_ line 1? When you start typing text followed by a newline, then move the cursor to the top of the file, does it say line 1, again? That's misleading. It sounds like CB talks about _visual lines_, not lines of text as defined by the newline character. Of course the OP (or I) misunderstand something. That's because that line field _is misleading_.
wilhelmtell
correction for self: no, wrong. ksrythxbai
wilhelmtell