views:

293

answers:

6

Hi,

I'm working on a piece of scientific software that is very cpu-intensive (its proc bound), but it needs to write data to disk fairly often (i/o bound).

I'm adding parallelization to this (OpenMP) and I'm wondering what the best way to address the write-to-disk needs. There's no reason the simulation should wait on the HDD (which is what it's doing now).

I'm looking for a 'best practice' for this, and speed is what I care about most (these can be hugely long simulations).

Thanks ~Alex

First thoughts:

having a separate process do the actual writing to disk so the simulation has two processes: one is CPU-bound (simulation) and one is IO-bound (writing file). This sounds complicated.

Possibly a pipe/buffer? I'm kind of new to these, so maybe that could be a possible solution.

+5  A: 

I'd say the best way would be to spawn a different thread to save the data, not a completely new process; with a new process, you run the trouble of having to communicate the data to be saved across the process boundary, which introduces a new set of difficulties.

McWafflestix
A: 

One thread continually executes a step of the computationally-intensive process and then adds the partial result to a queue of partial results. Another thread continually removes partial results from the queue and writes them to disk. Make sure to synchronize access to the queue. A queue is a list-like data structure where you can add items to the end and remove items from the front.

Justice
A: 

Since you are CPU and IO bound: Let me guess: There is still plenty of memory available, right?

If so you should buffer the data that has to be written to disk in memory to a certain extend. Writing huge chunks of data is usually a lot faster than writing small pieces.

For the writing itself: Consider using memory mapped IO. It's been a while since I've benchmarked, but last time I did it was significant faster.

Also you can always trade of CPU vs. IO a bit. I think you're currently writing the data as some kind of raw, uncompressed data, right? You may get some IO performance if you use a simple compression scheme to reduce the amount of data to be written. The ZLIB library is pretty easy to work with and compresses very fast on the lowest compression level. It depends on the nature of your data, but if there is a lot of redundancy in it even a very crude compression algorithm may eliminate the IO bound problem.

Nils Pipenbrinck
+3  A: 

The first solution that comes to mind is pretty much what you've said - having disk writes in their own process with a one-way pipe from the sim to the writer. The writer does writes as fast as possible (drawing new data off the pipe). The problem with this is that if the sim gets too far ahead of the writer, the sim is going to be blocking on the pipe writes anyway, and it will be I/O bound at one remove.

The problem is that in fact your simulation cycle isn't complete until it's spit out the results.

The second thing that occurs to me is to use non-blocking I/O. Whenever the sim needs to write, it should do so via non-blocking I/O. On the next need to write, it can then pick up the results of the previous I/O operation (possibly incurring a small wait) before starting the new one. This keeps the simulation running as much as possible in parallel with the I/O without letting the simulation get very far ahead of the writing.

The first solution would be better if the simulation processing cycle varies (sometimes smaller than the time for a write, sometimes longer) because on average the writes might keep up with the sim.

If the processing cycle is always (or almost always) going to be shorter than the write time then you might as well not bother with the pipe and just use non-blocking I/O, because if you use the pipe it will eventually fill up and the sim will get hung up on the I/O anyway.

Michael Kohne
I think the 1-way pipe is the way I'll go then. I don't think I'll run into the issue of blocking too badly; theres not a LOT of data being generated, I just wanted to separate the threads.If I was generating that much data, I'd reconsider how much actually needs to be kept.
ajray
A: 

Make your application have two threads, one for CPU and one for the hard disk.

Have the CPU thread push completed data into a queue which the hard disk thread then pulls from as data comes in.

This way the CPU just gets rid of the data and lets someone else handle it and the hard drive just patiently waits for any data in its queue.

Implementation wise, you could do the queue as a shared memory type of object, but I think a pipe would be exactly what you would be looking for. The CPU simply writes to the pipe when needed. On the hard disk side, you would just read the pipe and whenever you got valid data, proceed from there.

samoz
+2  A: 

If you implementing OpenMP to your program, then it is better to use #pragma omp single or #pragma omp master from parallel section to save to file. These pragmas allow only one thread to execute something. So, you code may look as following:

#pragma omp parallel
{
    // Calculating the first part
    Calculate();

    // Using barrier to wait all threads
    #pragma omp barrier

    #pragma omp master
    SaveFirstPartOfResults();

    // Calculate the second part
    Calculate2();

    #pragma omp barrier

    #pragma omp master
    SaveSecondPart();

    Calculate3();

    // ... and so on
}

Here team of threads will do calculation, but only single thread will save results to disk.

It looks like software pipeline. I suggest you to consider tbb::pipeline pattern from Intel Threading Building Blocks library. I may refer you to the tutorial on software pipelines at http://cache-www.intel.com/cd/00/00/30/11/301132_301132.pdf#page=25. Please read paragraph 4.2. They solved the problem: one thread to read from drive, second one to process read strings, third one to save to drive.

Vova