views:

5345

answers:

3

I want to read unsigned bytes from a binary file. So I wrote the following code.

#include <iostream>
#include <fstream>
#include <vector>
#include <istream>

std::string filename("file");
size_t bytesAvailable = 128;
size_t toRead = 128;

std::basic_ifstream<unsigned char> inf(filename.c_str(), std::ios_base::in | std::ios_base::binary) ;
if (inF.good())
{
    std::vector<unsigned char> mDataBuffer;
    mDataBuffer.resize(bytesAvailable) ;
    inF.read(&mDataBuffer[0], toRead) ;
    size_t counted = inF.gcount() ;
}

This results in reading in always 0 bytes as shown by the variable counted.

There seem to be references on the web saying that I need to set the locale to make this work. How to do this exactly is not clear to me.

The same code works using the data type 'char' instead of 'unsigned char'

The above code using unsigned char seems to work on Windows but fails running in a colinux Fedora 2.6.22.18 .

What do I need to do to get it to work for linux?

+6  A: 

Don't use the basic_ifstream as it requires specializtion.

Using a static buffer:

linux ~ $ cat test_read.cpp
#include <fstream>
#include <iostream>
#include <vector>
#include <string>


using namespace std;

int main( void )
{
        string filename("file");
        size_t bytesAvailable = 128;

        ifstream inf( filename.c_str() );
        if( inf )
        {
                unsigned char mDataBuffer[ bytesAvailable ];
                inf.read( (char*)( &mDataBuffer[0] ), bytesAvailable ) ;
                size_t counted = inf.gcount();
                cout << counted << endl;
        }

        return 0;
}
linux ~ $ g++ test_read.cpp
linux ~ $ echo "123456" > file
linux ~ $ ./a.out
7

using a vector:

linux ~ $ cat test_read.cpp

#include <fstream>
#include <iostream>
#include <vector>
#include <string>


using namespace std;

int main( void )
{
        string filename("file");
        size_t bytesAvailable = 128;
        size_t toRead = 128;

        ifstream inf( filename.c_str() );
        if( inf )
        {

                vector<unsigned char> mDataBuffer;
                mDataBuffer.resize( bytesAvailable ) ;

                inf.read( (char*)( &mDataBuffer[0]), toRead ) ;
                size_t counted = inf.gcount();
                cout << counted << " size=" << mDataBuffer.size() << endl;
                mDataBuffer.resize( counted ) ;
                cout << counted << " size=" << mDataBuffer.size() << endl;

        }

        return 0;
}
linux ~ $ g++ test_read.cpp -Wall -o test_read
linux ~ $ ./test_read
7 size=128
7 size=7

using reserve instead of resize in first call:

linux ~ $ cat test_read.cpp

#include <fstream>
#include <iostream>
#include <vector>
#include <string>


using namespace std;

int main( void )
{
        string filename("file");
        size_t bytesAvailable = 128;
        size_t toRead = 128;

        ifstream inf( filename.c_str() );
        if( inf )
        {

                vector<unsigned char> mDataBuffer;
                mDataBuffer.reserve( bytesAvailable ) ;

                inf.read( (char*)( &mDataBuffer[0]), toRead ) ;
                size_t counted = inf.gcount();
                cout << counted << " size=" << mDataBuffer.size() << endl;
                mDataBuffer.resize( counted ) ;
                cout << counted << " size=" << mDataBuffer.size() << endl;

        }

        return 0;
}
linux ~ $ g++ test_read.cpp -Wall -o test_read
linux ~ $ ./test_read
7 size=0
7 size=7

As you can see, without the call to .resize( counted ), the size of the vector will be wrong. Please keep that in mind. it is a common to use casting see cppReference

sfossen
This is reading signed chars. I know this works. I specifically want to read unsigned chars
David
just change the char[] to unsigned char[].
sfossen
and add the cast :P
sfossen
Is it possible to do so without casting?
David
sfossen
@David: There is no difference between signed and unsigned chars on disk. (or, for that matter, 4 chars and and int!)
Simon Buchan
@Simon: ints have endianess issues ;-)
Ryan Graham
There is nothing wrong with using vector. Its probably overkill in this case, since its a fixed size, but it will work correctly.
KeithB
@KeithB: True enough, there are more caveats, such as thinking the size is correct without the resize, but I've added the 2 new versions to demonstrate.
sfossen
+6  A: 

C++ does require the implementation only to provide explicit specializations for two versions of character traits:

std::char_traits<char>
std::char_traits<wchar_t>

The streams and strings use those traits to figure out a variety of things, like the EOF value, comparison of a range of characters, widening of a character to an int, and such stuff.

If you instantiate a stream like

std::basic_ifstream<unsigned char>

You have to make sure that there is a corresponding character trait specialization that the stream can use and that this specialization does do useful things. In addition, streams use facets to do actual formatting and reading of numbers. Likewise you have to provide specializations of those too manually. The standard doesn't even require the implementation to have a complete definition of the primary template. So you could aswell get a compile error:

error: specialization std::char_traits could not be instantiated.

I would use ifstream instead (which is a basic_ifstream<char>) and then go and read into a vector<char>. When interpreting the data in the vector, you can still convert them to unsigned char later.

Johannes Schaub - litb
A: 

A much easier way:

#include <fstream>
#include <vector>

using namespace std;


int main()
{
    vector<unsigned char> bytes;
    ifstream file1("main1.cpp", ios_base::in | ios_base::binary);
    unsigned char ch = file1.get();
    while (file1.good())
    {
     bytes.push_back(ch);
     ch = file1.get();
    }
    size_t size = bytes.size();
    return 0;
}
rlbond
That is very inefficient. Try running benchmarks with 1GB files, the overhead of the calls will show a big difference.
sfossen
why does this work but a call to read fail?
David
Because the file is a signed char one!!!! I should have seen that.
David
@david: it makes no difference in the file. 0xFF is 255 if stored in an unsigned char or -1 if stored in the signed char. Hence why the cast is not a bad thing. If this was multi byte the only difference would be if the endianness is different.
sfossen
@David: endianness is usually only a problem when switch architecture types eg. powerpc vs x86.
sfossen