tags:

views:

247

answers:

10

I am running simulation code that is largely bound by CPU speed. I am not interested in pushing data in/out to a user interface, simply saving it to disk as it is computed.

What would be the fastest solution that would reduce overhead? iostreams? printf? I have previously read that printf is faster. Will this depend on my code and is it impossible to get an answer without profiling?

Edit:

  1. Output data needs to be in text format, whether tab or comma separated. This will require formatting, precision, etc.
  2. Running in Windows.
+3  A: 

I haven't used them myself, but I've heard memory mapped files offer the best optimisation opportunities to the OS.

Edit: related question, and Wikipedia article on memory mapped files — both mention performance benefits.

AshleysBrain
Too bad there's no potable way :( +1
Billy ONeal
+4  A: 

Construct (large-ish) blocks of data which can be sequentially written and use asynchronous IO.

Accurately Profiling will be painfull, read some papers on the subject: scholar.google.com.

Hassan Syed
A: 

The fastest way is completion-based asynchronous IO.

By giving the OS a set of data to write, which it hasn't actually written when the call returns, the OS can reorder it to optimise write performance.

The API for doing this is OS specific: on Linux, its called AIO; on Windows its called Completion Ports.

Will
+1  A: 

Open the file in binary mode, and write "unformatted" data to the disc.

fstream myFile;
...
myFile.open ("mydata.bin", ios:: in | ios::out | ios::binary);
...
class Data {
    int      key;
    double   value;
    char[10] desc;
};

Data x;

myFile.seekp (location1);
myFile.write ((char*)&x, sizeof (Data));

EDIT: The OP added the "Output data needs to be in text format, whether tab or comma separated." constraint.

If your application is CPU bound, the formatting of output is an overhead that you do not need. Binary data is much faster to write and read than ascii, is smaller on the disc (e.g. there are fewer total bytes written with binary than with ascii), and because it is smaller it is faster to move around a network (including a network mounted file system). All indicators point to binary as a good overall optimization.

Viewing the binary data can be done after the run with a simple utility that will dump the data to ascii in whatever format is needed. I would encourage some version information be added to the resulting binary data to ensure that changes in the format of the data can be handled in the dump utility.

Moving from binary to ascii, and then quibbling over the relative performance of printf versus iostreams is likely not the best use of your time.

semiuseless
+3  A: 

Scott Meyers' More Effective C++ point 23 "Consider alternate libraries" suggests using stdio over iostream if you prefer speed over safety and extensibility. It's worth checking.

stefaanv
+3  A: 

My thought is that you are tackling the wrong problem. Why are you writing out vast quantities of text formatted data? If it is because you want it to be human readable, writing a quick browser program to read the data in binary format on the fly - this way the simulation application can quickly write out binary data and the browser can do the grunt work of formatting the data as and when needed. If it is because you are using some stats package to read and analyse text data then write one that inputs binary data.

ravenspoint
+1  A: 

The fastest way is what is fastest for your particular application running on its typical target OS and hardware. The only sensible thing to do do is to try several approaches and time them. You probably don't need a complete profile, and the exercise should only take a few hours. I would test, in this order:

  • normal C++ stream I/O
  • normal stream I/O using ostream::write()
  • use of the C I/O library
  • use of system calls such as write()
  • asynch I/O

And I would stop when I found a solution that was fast enough.

anon
+1  A: 

Text format means it's for human consumption. The speed at which humans can read is far, far lower than the speed of any reasonable output method. There's a contradiction somewhere. I suspect the "output must be text format".

Therefore, I beleive the correct was is to output binary, and provide a separate viewer to convert individual entries to readable text. Formatting in the viewer need only be as fast as people can read.

MSalters
A: 

A fast method is to use double buffering and multiple threads (at least two).

One thread is in charge of writing data to the hard drive. This task checks the buffer and if not empty (or another rule perhaps) begins writing to the hard drive.

The other thread writes formatted text to the buffer.

One performance issue with hard drives is the amount of time required to get up to speed and position the head to the correct location. To avoid this from happening, the objective is to continually write to the hard drive so that it doesn't stop. This is tricky and may involve stuff outside of your program's scope (such as other programs running at the same time). The larger the chunk of data written to the hard drive, the better.

Another thorn is finding empty slots on the hard drive to put the data. A fragmented hard drive would be slower than a formatted or defragmented drive.

If portability is not an issue, you can check your OS for some APIs that perform block writes to the hard drive. Or you can go down lower and use the API that writes directly to the drive.

You may also want your program to change it's priority so that it is one of the most important tasks running.

Thomas Matthews
I am not sure that adding threads for IO is the right move. The main computational loop is CPU bound. If the machine is multi-core, then a better overall optimization would likely to be to add parallelism to the computational portion of the code. If the machine is not multi-core, then adding threads for IO when the main loop is already CPU bound may not increase the overall throughput of the application.
semiuseless
A: 

Writing to a memory mapping file might help?

Marcus Lindblom