views:

935

answers:

7

While troubleshooting some performance problems in our apps, I found out that C's stdio.h functions (and, at least for our vendor, C++'s fstream classes) are threadsafe. As a result, every time I do something as simple as fgetc, the RTL has to acquire a lock, read a byte, and release the lock.

This is not good for performance.

What's the best way to get non-threadsafe file I/O in C and C++, so that I can manage locking myself and get better performance?

  • MSVC provides _fputc_nolock, and GCC provides unlocked_stdio and flockfile, but I can't find any similar functions in my compiler (CodeGear C++Builder).
  • I could use the raw Windows API, but that's not portable and I assume would be slower than an unlocked fgetc for character-at-a-time I/O.
  • I could switch to something like the Apache Portable Runtime, but that could potentially be a lot of work.

How do others approach this?

Edit: Since a few people wondered, I had tested this before posting. fgetc doesn't do system calls if it can satisfy reads from its buffer, but it does still do locking, so locking ends up taking an enormous percentage of time (hundreds of locks to acquire and release for a single block of data read from disk). Not doing character-at-a-time I/O would be a solution, but C++Builder's fstream classes unfortunately use fgetc (so if I want to use iostream classes, I'm stuck with it), and I have a lot of legacy code that uses fgetc and friends to read fields out of record-style files (which would be reasonable if it weren't for locking issues).

A: 

I wouldn't bother, personally. The cost of acquiring a lock when there's no contention is miniscule compared to the cost of a file I/O.

Peter Ruderman
fgetc() is buffered; most calls to it perform no I/O, and only read out of a buffer.
Josh Haberman
+3  A: 

I'd simply not do IO a char at a time if it is sensible performancewize.

AProgrammer
all operations to streams result in char IO. You must work with stream buffers. I have posted below how...
ovanes
+1  A: 

fgetc is almost certainly not reading a byte each time you call it (where by 'reading' I mean invoking a system call to perform I/O). Look somewhere else for your performance bottleneck, as this is probably not the problem, and using unsafe functions is certainly not the solution. Any lock handling you do will probably be less efficient than the handling done by the standard routines.

William Pursell
It isn't reading a byte at a time, but it can very well take a lock each time. BTW, taking a lock is mandatory under POSIX, there is getc_unlocked() (and _unlocked variant of char by char IO functions, and lock functions so that they can be protected).
AProgrammer
+1  A: 

The easiest way would be to read the entire file in memory, and then provide your own fgetc-like interface to that buffer.

MSalters
+1  A: 

Why not just memory map the file? Memory mapping is portable (except in Windows Vista which requires you to jump through hopes to use it now, the dumbasses). Anyhow, map your file into memory, and do you're own locking/not-locking on the resulting memory location.

The OS handles all the locking required to actually read from the disk - you'll NEVER be able to eliminate this overhead. But your processing overhead, on the otherhand, won't be affected by extraneous locking other than that which you do yourself.

Chris Kaminski
+1  A: 

Hi!

the multi-platform approach is pretty simple. Avoid functions or operators where standard specifies that they should use sentry. sentry is an inner class in iostream classes which ensures stream consistency for every output character and in multi-threaded environment it locks the stream related mutex for each character being output. This avoids race conditions at low level but still makes the output unreadable, since strings from two threads might be output concurrently as the following example states:

thread 1 should write: abc
thread 2 should write: def

The output might look like: adebcf instead of abcdef or defabc. This is because sentry is implemented to lock and unlock per character.

The standard defines it for all functions and operators dealing with istream or ostream. The only way to avoid this is to use stream buffers and your own locking (per string for example).

I have written an app, which outputs some data to a file and mesures the speed. If you add here a function which ouptuts using the fstream directly without using the buffer and flush, you will see the speed difference. It uses boost, but I hope it is not a problem for you. Try to remove all the streambuffers and see the difference with and without them. I my case the performance drawback was factor 2-3 or so.

The following article by N. Myers will explain how locales and sentry in c++ IOStreams work. And for sure you should look up in ISO C++ Standard document, which functions use sentry.

Good Luck,
Ovanes

#include <vector>
#include <fstream>
#include <iterator>
#include <algorithm>
#include <iostream>
#include <cassert>
#include <cstdlib>

#include <boost/progress.hpp>
#include <boost/shared_ptr.hpp>

double do_copy_via_streambuf()
{
  const size_t len = 1024*2048;
  const size_t factor = 5;
  ::std::vector<char> data(len, 1);

  std::vector<char> buffer(len*factor, 0);

  ::std::ofstream
    ofs("test.dat", ::std::ios_base::binary|::std::ios_base::out);
  noskipws(ofs);

  std::streambuf* rdbuf = ofs.rdbuf()->pubsetbuf(&buffer[0], buffer.size());

  ::std::ostreambuf_iterator<char> oi(rdbuf);

  boost::progress_timer pt;

  for(size_t i=1; i<=250; ++i)
  {
    ::std::copy(data.begin(), data.end(), oi);
    if(0==i%factor)
      rdbuf->pubsync();
  }

  ofs.flush();
  double rate = 500 / pt.elapsed();
  std::cout << rate << std::endl;
  return rate;
}

void count_avarage(const char* op_name, double (*fct)())
{
    double av_rate=0;
    const size_t repeat = 1;
    std::cout << "doing " << op_name << std::endl;
    for(size_t i=0; i<repeat; ++i)
     av_rate+=fct();

    std::cout << "average rate for " << op_name << ": " << av_rate/repeat 
            << "\n\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n"
            << std::endl;
}


int main()
{
    count_avarage("copy via streambuf iterator", do_copy_via_streambuf);
    return 0;
}
ovanes
+1  A: 

One thing to consider is to build a custom runtime. Most compilers provide the source to the runtime library (I'd be surprised if it weren't in the C++ Builder package).

This could end up being a lot of work, but maybe they've localized the thread support to make something like this easy. For example, with the embedded system compiler I'm using, it's designed for this - they have documented hooks to add the lock routines. However, it's possible that this could be a maintenance headache, even if it turns out to be relatively easy initially.

Another similar route would be to talk to someone like Dinkumware about using a 3rd party runtime that provides the capabilities you need.

Michael Burr