views:

305

answers:

4

In the following C++ code, I realised that gcount() was returning a larger number than I wanted, because getline() consumes the final newline character but doesn't send it to the input stream.

What I still don't understand is the program's output, though. For input "Test\n", why do I get " est\n"? How come my mistake affects the first character of the string rather than adding unwanted rubbish onto the end? And how come the program's output is at odds with the way the string looks in the debugger ("Test\n", as I'd expect)?

#include <fstream>
#include <vector>
#include <string>
#include <iostream>

using namespace std;

int main()
{
    const int bufferSize = 1024;
    ifstream input( "test.txt", ios::in | ios::binary );

    vector<char> vecBuffer( bufferSize );
    input.getline( &vecBuffer[0], bufferSize );
    string strResult( vecBuffer.begin(), vecBuffer.begin() + input.gcount() );
    cout << strResult << "\n";

    return 0;
}
+1  A: 

I tested your code using Visual Studio 2005 SP2 on Windows XP Pro SP3 (32-bit), and everything works fine.

Naaff
Interesting. How did you create the test file? I used Notepad++.
Tommy Herbert
Notepad. Typed "Test" and then pressed return and saved "test.txt".
Naaff
What happens when you change your cout line to this?: cout << "strResult = \"" << strResult << "\"\n";
Naaff
Weird: I just did the same and it spat out " est". The "About" box for my installation of Visual Studio has "(SP.050727-7600)" after the version number. So I guess I don't have SP2 installed. Do you think that could be it?
Tommy Herbert
Oh, hang on, I'll try what you suggested - I hadn't updated before commenting.
Tommy Herbert
' "rResult = "Test' - which appears to support T.E.D.'s theory.
Tommy Herbert
Looks likely. You could try building your string like this to crop off the last character (newline): string strResult( vecBuffer.begin(), vecBuffer.begin() + input.gcount() - 1 );
Naaff
Yes, that makes everything fine. Mysterious though.
Tommy Herbert
+5  A: 

I've duplicated Tommy's problem on a Windows XP Pro Service Pack 2 system with the code compiled using Visual Studio 2005 SP2 (actually, it says "Version 8.0.50727.879"), built as a console project.

If my test.txt file contains just "Test" and a CR, the program spits out " est" (note the leading space) when run.

If I had to take a wild guess, I'd say that this version of the implementation has a bug where it is treating the Windows newline character like it should be treated in Unix (as a "go to the front of the same line" character), and then it wipes out the first character to hold part of the next prompt or something.


Update: After playing with it a bit, I'm positive that is what is going on. If you look at strResult in the debugger, you will see that it copied over a decimal 13 value at the end. That's CR, which in Windows-land is '\n', and everywhere else is "return to the beginning of the line". If I instead change your constructor to read:

string strResult( vecBuffer.begin(), vecBuffer.begin() + input.gcount() - 1 );

...(so that the CR isn't copied) then it prints out "Test" like you'd expect.

T.E.D.
That sounds promising. The newline character shouldn't be getting into the string at all, but maybe Notepad's CR+LF newlines are interpreted as a weird instruction followed by a new line. The weird instruction must be to go to the front and then print a space - see the comments to Naaff's answer. Wonder why he couldn't reproduce?
Tommy Herbert
I think I wasn't able to reproduce the problem because I hadn't put your code into a console project -- I just tacked in into some code that I was working on, not thinking that it would matter.
Naaff
I've had this happen on Windows more than once. And each time I spend an unusually long time staring at my monitor before I remember what's up.
Max Lybbert
I used emacs to create the file, so you can't blame notepad.
T.E.D.
input.gcount-1 still copies the \r, but skips the \0 (see my answer)
Dolphin
+10  A: 

I've also duplicated this result, Windows Vista, Visual Studio 2005 SP2.

When I figure out what the heck is happening, I'll update this post.

edit: Okay, there we go. The problem (and the different results people are getting) are from the \r. What happens is you call input.getline and put the result in vecBuffer. The getline function strips off the \n, but leaves the \r in place.

You then transfer the vecBuffer to a string variable, but use the gcount function from input, meaning you will get one char too much, because the input variable still contains the \n, and the vecBuffer does not.

The resulting strResult is:

-    strResult "Test"
     [0] 84 'T' char
     [1] 101 'e' char
     [2] 115 's' char
     [3] 116 't' char
     [4] 13 '␍' char
     [5] 0 char

So then "Test" is printed, followed by a carriage return (puts the cursor back at the start of the line), a null character (overwriting the T), and finally the \n, which correctly puts the cursor on the new line.

So you either have to strip out the \r, or write a function that gets the string length directly from vecBuffer, checking for null characters.

Aistina
Excellent. Thanks for the explanation. It's what I'd started to suspect. The only remaining question is why Naaff's result differs from mine and T.E.D.'s.
Tommy Herbert
Not sure, maybe he missed his Enter key and ended up with a file without a line break? My first guess would've been that he had used an editor that uses Linux-style line breaks, but he says he used Notepad, so unless XP SP3 changed the behaviour of Notepad...
Aistina
The bit about only stripping one terminator is the rub I think. I checked the text file in emacs hexl-mode, and it ends with a CRLF combo. The LF on the end is not read in, but clearly the CR is.
T.E.D.
As noted on T.E.D.'s answer, I wasn't testing in a console project, so I suspect that might be the reason I didn't see the problem.
Naaff
The problem is that the 0 is getting output, and the console is overwriting the T with a blank character, while some other shells are just doing nothing. strResult.length()==6
Dolphin
+2  A: 

I am pretty sure that the T is actually getting written and then overwritten. Running the same program in an rxvt window (cygwin) produces the expected output. You can do a couple things. If you get rid of the ios::binary in your open it will autoconvert \r\n to \n and things will work like you expect.

You can also open up your text file in the binary editor by clicking on the little down arrow on the open file dialog's open button and selecting open with...->Binary Editor. This will let you look at your file and confirm that it does indeed have \r\n and not just \n.

Edit: I redirected the output to a file and it is writing out:

Test\r\0\r\n

The reason you are getting the \0 is that gcount returns 6 (6 characters were removed from the stream) but the final delimiter is not copied to the buffer, a '\0' is instead. when you are constructing the string, you are actually telling it to include the '\0'. std::string has no problem with the embedded 0 and outputs it as asked. Some shells are apparently outputting a blank character and overwriting the T, while others don't do anything and the output looks okay, but is still probably wrong because it has the embedded '\0'

cout << strResult.c_str() << "\n";

Changing the last line to this will stop on the \0 and also get the output expected.

Dolphin
Thanks, Dolphin. A lot of us had been assuming that gcount() returned 5. I don't know why - it seems obvious that it would be 6 now that you've pointed it out. The difference between subtracting 1 from gcount's result and subtracting 2 can't be seen using my posted code, but Naaff's suggested change to the cout line shows it up.
Tommy Herbert