views:

76

answers:

2

In the below code, I have a corrupt "hello.bz2" which has stray characters beyond the EOF.

Is there a way to make the boost::iostreams::copy() call to throw ?

#include <fstream>
#include <iostream>
#include <boost/iostreams/filtering_streambuf.hpp>
#include <boost/iostreams/copy.hpp>
#include <boost/iostreams/filter/bzip2.hpp>

int main() 
{
    using namespace std;
    using namespace boost::iostreams;

    ifstream file("hello.bz2", ios_base::in | ios_base::binary);
    filtering_streambuf<input> in;
    in.push(bzip2_decompressor());
    in.push(file);
    boost::iostreams::copy(in, cout);
}

EDIT: Please ignore the line that is so far attracted most attention; the EOF. Please assume working with a corrupted bzip2 file. I used "EOF" suggesting the error I got when I run bzcat on the file

bzcat hello.bz2
hello world

bzcat: hello.bz2: trailing garbage after EOF ignored
A: 

How do you have stray characters beyond the end of the file?

If you mean that the file has garbage data in it, how would the decompression algorithm be able to tell whether or not the data is garbage to be able to make a decision to throw?

Mark B
Here is what I see when I run bzcat on the file. "bzcat: hello.bz2: trailing garbage after EOF ignored"To be honest, I am more interested in finding the way to capture the error from decompressor.
CodeMedic
+1  A: 

Research

std::ios_base::failure is the "the base class for the types of all objects thrown as exceptions, by functions in the Iostreams library, to report errors detected during stream buffer operations."

Looking at the boost docs:

class bzip2_error : public std::ios_base::failure {
public:
    bzip2_error(int error);
    int error() const;
};

bzip2_error is a specific exception thrown when using the bzip2 filter, which inherits from std::ios_base::failure. As you can see, it is constructed by passing in an integer representing the error code. It also has a method error() which returns the error code it was constructed with.
The docs list bzip2 error codes as the following:

  • data_error - Indicates that the compressed data stream is corrupted. Equal to BZ_DATA_ERROR.
  • data_error_magic - Indicates that the compressed data stream does not begin with the 'magic' sequence 'B' 'Z' 'h'. Equal to BZ_DATA_ERROR_MAGIC.
  • config_error - Indicates that libbzip2 has been improperly configured for the current platform. Equal to BZ_CONFIG_ERROR.

Code

EDIT I also want to clarify that boost::iostreams::copy() will not be the one throwing the exception here, but the bzip2 filter. Only the iostream or filters will throw exceptions, copy just uses the iostream/filter which may cause the iostream/filter to throw an exception.

*EDIT 2 * It appears the problem is with bzip2_decompressor_impl as you have expected. I have replicated the endless spinning loop when the bz2 file is empty. It took me a little while to figure out how to build boost and link with bzip2, zlib, and iostreams library to see if I could replicate your results.

g++ test.cpp -lz -lbz2 boostinstall/boost/bin.v2/libs/iostreams/build/darwin-4.2.1/release/link-static/threading-multi/libboost_iostreams.a -Lboostinstall/boost/bin.v2/libs/ -Iboost/include/boost-1_42 -g

test.cpp:

#include <fstream>
#include <iostream>
#include <boost/iostreams/filtering_streambuf.hpp>
#include <boost/iostreams/copy.hpp>
#include <boost/iostreams/filter/bzip2.hpp>

int main()
{
    using namespace std;
    using namespace boost::iostreams;

    try {
        ifstream file("hello.bz2", ios_base::in | ios_base::binary);
        filtering_streambuf<input> in;
        in.push(bzip2_decompressor());
        in.push(file);
        boost::iostreams::copy(in, cout);
    }
    catch(const bzip2_error& exception) {
        int error = exception.error();

        if(error == boost::iostreams::bzip2::data_error) {
            // compressed data stream is corrupted
            cout << "compressed data stream is corrupted";
        }
        else if(error == boost::iostreams::bzip2::data_error_magic)
        {
            // compressed data stream does not begin with the 'magic' sequence 'B' 'Z' 'h'
            cout << "compressed data stream does not begin with the 'magic' sequence 'B' 'Z' 'h'";
        }
        else if(boost::iostreams::bzip2::config_error) {
            // libbzip2 has been improperly configured for the current platform
            cout << "libbzip2 has been improperly configured for the current platform";
        }
    }
}

debugging:

gdb a.out
(gdb) b bzip2.hpp:344

There is a loop that drives the bzip2's uncompression in symmetric.hpp:109 :

        while (true)
        {
            // Invoke filter if there are unconsumed characters in buffer or if
            // filter must be flushed.
            bool flush = status == f_eof;
            if (buf.ptr() != buf.eptr() || flush) {
                const char_type* next = buf.ptr();
                bool done =
                    !filter().filter(next, buf.eptr(), next_s, end_s, flush);
                buf.ptr() = buf.data() + (next - buf.data());
                if (done)
                    return detail::check_eof(
                               static_cast<std::streamsize>(next_s - s)
                           );
            }

            // If no more characters are available without blocking, or
            // if read request has been satisfied, return.
            if ( (status == f_would_block && buf.ptr() == buf.eptr()) ||
                 next_s == end_s )
            {
                return static_cast<std::streamsize>(next_s - s);
            }

            // Fill buffer.
            if (status == f_good)
                status = fill(src);
        }

bzip2_decompressor_impl's filter method bzip2.hpp:344 gets called on symmetric.hpp:117 :

template<typename Alloc>
bool bzip2_decompressor_impl<Alloc>::filter
    ( const char*& src_begin, const char* src_end,
      char*& dest_begin, char* dest_end, bool /* flush */ )
{
    if (!ready())
        init();
    if (eof_)
        return false;
    before(src_begin, src_end, dest_begin, dest_end);
    int result = decompress();
    after(src_begin, dest_begin);
    bzip2_error::check BOOST_PREVENT_MACRO_SUBSTITUTION(result);
    return !(eof_ = result == bzip2::stream_end);
}

I think the problem is simple, the bzip2_decompressor_impl's eof_ flag never gets set. Unless it's suppose to happen in some magic way I don't understand, it's owned by the bzip2_decompressor_impl class and it's only ever being set to false. So when we do this:

cat /dev/null > hello.bz2

We get a spinning loop that never ends, we don't break when an EOF is hit. This is certainly a bug, because other programs (like vim) would have no problem opening a text file created in a similar manner. However I am able to get the filter to throw when the bz2 file is "corrupted":

echo "other corrupt" > hello.bz2
./a.out
compressed data stream does not begin with the 'magic' sequence 'B' 'Z' 'h'

Sometimes you have to take open source code with a grain of salt. It will be more likely that your bz2's will be corrupted and properly throw. However, the /dev/null case is a serious bug. We should submit it to the boost dev so they can fix it.

manifest
@manifestThanks a lot for the detailed response. But unfortunately it hasn't helped my particular case. Not sure what is going on under the hood.There seems something fishy about boost iostreams and its filtering mechanism. May be it is just the new entrant, the bzip2 functionality.One thing I noticed was that the filter mechanism fails to figure out that some thing wrong is going on if I feed the filter with a bz2 file which is malformed. The easiest one is if in the above piece of code, create an empty hello.bz2 and run it. you will see cpu usage spiking to 99% ?!?
CodeMedic
by empty hello.bz2, I mean "cat /dev/null > hello.bz2"
CodeMedic
See edit, I believe there is a bug with the bzip2 filter implementation
manifest
Thinking about submitting a bug, I checked out boost from trunk. The bzip2 filter code is all completely different so they may have fixed it. Will update here later
manifest
When I checked out trunk, bjammed it and relinked cat /dev/null > hello.bz2 then ./a.out results in "libbzip2 has been improperly configured for the current platform". Although this may not be the exception I would expect, at least it's an exception instead of endless spinning. A properly created .bz2 file will still work and a corrupted file throws the magic sequence exception. I would recommend making sure you link to the latest libraries.
manifest
@manifest Thanks a million!
CodeMedic
Hey, you're welcome. Thanks for the learning experience :)
manifest
If in case anyone would like to get around this issue in boost 1.43; get iostreams source from release branch, and compile with "define=BOOST_IOSTREAMS_USE_DEPRECATED"The regression tests might fail; but you can replace libs/iostreams/test/detail/temp_file.hpp with one from 1.43.
CodeMedic