ansaurus

Question

Fancy way to read a file in C++ : strange performance issue

Answer 1

+1 A:

The iterator approach reads the file one character at a time, while the file.read does it in a single hit.

If the operating system/file handlers know you want to read a large amount of data, there's lots of optimizations that can be done - maybe reading the whole file on a single revolution of the disk spindle, not copying data from OS buffers to application buffers.

When you do byte-by-byte transfers, the OS has no clue what you're really wanting to do, so cannot perform such optimizations.

Roddy 2010-07-22 17:20:50

The filebuf object inside of fstream reads the file block-by-block. Anyway it's the same way to read the file in both cases, it's just copying from filebuf to vector<char> that is slow

Tomaka17 2010-07-22 17:26:16

Answer 2

+3 A:

Only profiling will tell you why exactly. My guess would be that what you are seeing is just the overhead of all of the extra function calls associated with the second method. Instead of a single call to bring in all the data, you are doing 1.6M calls*... or something along those lines.

* Many of them are virtual which means two CPU cycles per call. (Tks Zan)

Gianni 2010-07-22 17:22:39

Yeah, and those calls are indirect virtuals. Those suck.

Zan Lynx 2010-07-22 17:43:23

Profiling tells me that a lot of functions consume between 3 and 5 % of the total each, no function comes out at the top ; I mark you as accepted answer

Tomaka17 2010-07-22 17:59:58

@Zan Yeah! Good point, forgot to mention: virtuals!

Gianni 2010-07-22 18:33:46

Answer 3

+2 A:

You should compare apple-to-apple.

Your first code read unformatted binary data because you use the function member "read". And not because you use std::ios_binary by the way, see http://stdcxx.apache.org/doc/stdlibug/30-4.html for more explication, but in short : "The effect of the binary open mode is frequently misunderstood. It does not put the inserters and extractors into a binary mode, and hence suppress the formatting they usually perform. Binary input and output is done solely by basic_istream<>::read() and basic_ostream<>::write()"

So your second code with istream_iterator read formatted text. It's way slower.

If you want to read unformatted binary data, use istreambuf_iterator :

#include <fstream>
#include <vector>
#include <iterator>

std::ifstream file( "file.txt", std::ios::binary);
std::vector<char> buffer((std::istreambuf_iterator<char>(file)),
                          std::istreambuf_iterator<char>());

On my platform (VS2008), istream_iterator is about x100 slower than read(). istreambuf_iterator performs better, but still x10 slower than read().

Thomas Petit 2010-07-22 18:06:17

Thanks, I didn't think about locale, width to extract, etc.

Tomaka17 2010-07-22 18:48:43

But how can the processing of the data in memory be 100 times slower than the file I/O?

ruslik 2010-07-22 19:21:09

ansaurus

tags:

views:

answers:

Fancy way to read a file in C++ : strange performance issue

related questions