tags:

views:

225

answers:

2

Ok, Here's some code that outlines what I'm trying to do.

#include <sys/types.h>
#include <sys/stat.h>
#include <sys/fcntl.h>

#include <iostream>
#include <sstream>

int main( int c, char *v[] )
{
    int fd = open( "data.out", O_RDONLY | O_NONBLOCK );
    std::cout << "fd = " << fd << std::endl;

    char buffer[ 1024000 ];
    ssize_t nread;

    std::stringstream ss;

    while( true )
    {
        if ( (nread = read( fd, buffer, sizeof( buffer ) - 1 )) < 0 )
            break;

        ss.write( buffer, nread );

        while( true )
        {
            std::stringstream s2;

            std::cout << "pre-get  : " <<
                (((ss.rdstate() & std::ios::badbit) == std::ios::badbit) ? "bad" : "") << " " <<
                (((ss.rdstate() & std::ios::eofbit) == std::ios::eofbit) ? "eof" : "") << " " <<
                (((ss.rdstate() & std::ios::failbit) == std::ios::failbit) ? "fail" : "" ) << " " <<
                std::endl;

            ss.get( *s2.rdbuf() );

            std::cout << "post-get : " <<
                (((ss.rdstate() & std::ios::badbit) == std::ios::badbit) ? "bad" : "") << " " <<
                (((ss.rdstate() & std::ios::eofbit) == std::ios::eofbit) ? "eof" : "") << " " <<
                (((ss.rdstate() & std::ios::failbit) == std::ios::failbit) ? "fail" : "" ) << " " <<
                std::endl;

            unsigned int linelen = ss.gcount() - 1;

            if ( ss.eof() )
            {
                ss.str( s2.str() );
                break;
            }
            else if ( ss.fail() )
            {
                ss.str( "" );
                break;
            }
            else
            {
                std::cout << s2.str() << std::endl;
            }
        }
    }
}

It firstly reads large chunks of data into a data buffer. I know there's better C++ ways of doing this part but in my real application I am handed a char[] buffer and a length.

I then write the buffer into a std::stringstream object so I can remove a line at a time from it.

I thought I'd use the get( streambuf & ) method on the stringstream to write one line to another stringstream where I can then output it.

Ignoring the fact that this may not be the best way to extract a line at a time from the buffer I've read in (although I'd like anyone to offer up a better alternative to the one I post here), as soon as the first ss.get( *s2.rdbuf() ) is called the ss is in a fail state and I can't work out why. There's plenty of data in the input file so ss should definately contain more than one line of input.

Any ideas?

A: 

I have tested this on Windows so you may want to verify this;

If the data.out starts with a newline, I get the same problem you have otherwise the ss.get( *s2.rdbuf() ) works fine for the first call.

When called the second time, the current position of the stream has not advanced past the EOL. So when called the second time get immediately tries to read the EOL and since no other chars have been copied, it sets the fail bit.

Quick and maybe dirty fix:

ss.get( *s2.rdbuf() );
// Get rid of EOL (may need an extra if file contains both \r and \n)
ss.get();
Fredrik Jansson
+1  A: 

It seems to me that the first (and probably biggest) step to get decent efficiency is to minimize copying the data. Since you're being given the data in a char[] with a length, my first tendency would be to start by creating a strstream using that buffer. Then instead of copying a string at a time to another strstream (or stringstream) I'd copy strings one at a time to the stream you'll use to write them to the output.

If you're allowed to modify the contents of the buffer, another possibility would be to parse buffer into lines by simply replacing each '\n' with a '\0'. If you're going to do that, you'll usually want to create a vector (deque, etc.) of pointers to the beginning of each line as well (i.e. find the first '\r' or '\n', and replace it with a '\0'. Then, the next thing other than a '\r' or '\n' is the beginning of the next line, so its address in your vector).

I'd also think hard about whether you can avoid the line-at-a-time output. Reading through a large buffer to find newline's is relatively slow. If you're going to end up writing one line after another anyway, you could avoid all this by just writing to the whole buffer to the output stream and being done with it.

Jerry Coffin