tags:

views:

73

answers:

1

My program reads dozens of very large files in parallel, just one line at a time. It seems like the major performance bottleneck is HDD seek time from file to file (though I'm not completely sure how to verify this), so I think it would be faster if I could buffer the input.

I'm using C++ code like this to read my files through boost::iostreams "filtering streams":
input = new filtering_istream;
input->push(gzip_decompressor())
file_source in (fname);
input->push(in);

According to the documentation, file_source does not have any way to set the buffer size but filtering_stream::push seems to:
void push( const T& t,
std::streamsize buffer_size,
std::streamsize pback_size );

So I tried input->push(in, 1E9) and indeed my program's memory usage shot up, but the speed didn't change at all.

Was I simply wrong that read buffering would improve performance? Or did I do this wrong? Can I buffer a file_source directly, or do I need to create a filtering_streambuf? If the latter, how does that work? The documentation isn't exactly full of examples.

A: 

You should profile it too see where the bottleneck is.

Perhaps it's in the kernel, perhaps your at your hardware's limit. Until you profile it to find out you're stumbling in the dark.

EDIT:

Ok, a more thorough answer this time, then. According to the Boost.Iostreams documentation basic_file_source is just a wrapper around std::filebuf, which in turn is built on std::streambuf. To quote the documentation:

CopyConstructible and Assignable wrapper for a std::basic_filebuf opened in read-only mode.

streambuf does provide a method pubsetbuf (not the best reference perhaps, but the first google turned up) which you can, apparently, use to control the buffer size.

For example:

#include <fstream>

int main()
{
  char buf[4096];
  std::ifstream f;
  f.rdbuf()->pubsetbuf(buf, 4096);
  f.open("/tmp/large_file", std::ios::binary);

  while( !f.eof() )
  {
      char rbuf[1024];
      f.read(rbuf, 1024);
  }

  return 0;
}

In my test (optimizations off, though) I actually got worse performance with a 4096 bytes buffer than a 16 bytes buffer but YMMV -- a good example of why you should always profile first :)

But, as you say, the basic_file_sink does not provide any means to access this as it hides the underlying filebuf in its private part.

If you think this is wrong you could:

  1. Urge the Boost developers to expose such functionality, use the mailing list or the trac.
  2. Build your own filebuf wrapper which does expose the buffer size. There's a section in the tutorial which explains writing custom sources that might be a good starting point.
  3. Write a custom source based on whatever, that does all the caching you fancy.

Remember that your hard drive as well as the kernel already does caching and buffering on file reads, which I don't think that you'll get much of a performance increase from caching even more.

And in closing, a word on profiling. There's a ton of powerful profiling tools available for Linux an I don't even know half of them by name, but for example there's iotop which is kind of neat because it's super simple to use. It's pretty much like top but instead shows disk related metrics. For example:

Total DISK READ: 31.23 M/s | Total DISK WRITE: 109.36 K/s
TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND          
19502 be/4 staffan    31.23 M/s    0.00 B/s  0.00 % 91.93 % ./apa

tells me that my progam spends over 90% of its time waiting for IO, i.e. it's IO bound. If you need something more powerful I'm sure google can help you.

And remember that benchmarking on a hot or cold cache greatly affects the outcome.

Staffan
Maybe this is a question for another post, but how would I profile it? I know how to use gprof, but it only tells me about CPU time, and here I'm pretty sure the bottleneck is disk I/O.Or if someone can tell me how to set the buffer size correctly, I can just try it and see if it helps.
@jwfoley: I like [Valgrind's](http://valgrind.org/) callgrind profiler. As far as my experience goes (read as: I can't guarantee anything) it reports time spent in kernel calls as well, something I could never get gprof to do. I use it to profile an application with OpenGL for example, and it correctly reports time spent in the video driver code. It's very easy to use (valgrind --tool=callgrind ./your-app). Use [KCachegrind](http://kcachegrind.sourceforge.net/html/Home.html) to interpret the results. The only catch is that your application will run 20 times or so slower while profiling.
Staffan
@Staffan: Okay, I tried callgrind + KCachegrind and I'm impressed with the profiler, but I still don't know what I'm looking for. The results look pretty similar to gprof's. Something called T.3577 has a high "Incl." but low "Self;" most of its time seems to be spent in std::basic_ios. Perhaps that's the disk I/O?I'd still like an answer to my original question of how to set the buffer size. If it's easy, then I can just try it and see if it helps, but it would be useful to know anyway.
@jwfoley: Ok see if my update above helps you out
Staffan
@Staffan: Thanks for your very thorough answer. Unfortunately, the most important part was this: "Remember that your hard drive as well as the kernel already does caching and buffering on file reads". I can't use iotop unless I recompile my kernel with CONFIG_TASK_DELAY_ACCT=y, but judging from the fact that my memory fills up with cache when I run my program, it seems pretty reasonable to assume there's nothing to gain by customizing my own caching. Perhaps that part is as fast as it's going to get. At any rate, I've learned a few things, and I wish I could upvote this so others can benefit.
@jwfoley: Oh, iotop runs out of the box on my Fedora machine. I guess it's not so simple to use if you have to re-compile the kernel, though :) If the answer was to your satisfaction it's good Stackoverflow-etiquette to accept the answer.
Staffan