views:

92

answers:

3

Hello,

I am doing some long simulations that can take from several hours to several days and I am logging the information into files. The files can reach sizes of hundreds of Mb and inside there is just a list of numbers. I am really concern about the overhead that this is originating. I would like to ask if the overhead of using this method is really big and if there is any other more efficient method to do the same, just log information.

I am using C++ and to log the files I just use the common methods of fprintf. To explain the overhead if you can give a practical example similar to, using files it takes this time without using them this time, that will be ideally.

I did some test but I have no idea if the overhead grows lineally with the size of the files. What I am saying is that maybe is not the same add a line to a file of a size of 1Mb than a file of size of 1Gb. Does anyone know how the overhead grow with the size of the file?.

+2  A: 

"Hundreds of megabytes" is probably irrelevant in the course of a few days. Hundreds of gigabytes could well be significant, but probably still wouldn't be huge.

There's an obvious way of finding out the answer for your exact application though: run a simulation with logging turned on, and time it. Then run it (with the same input) with logging turned off, and time it. Compare the difference. Ideally, do this several times to counterbalance other disturbances. I suspect you'll find that the potential benefit of lots of logging vastly outweighs the performance hit.

Jon Skeet
+3  A: 

You just need some back-of-the-envelope calculations, I think.

Let "hundreds of Mb" be 400MB.
Let "several hours to several days" be 48 hours.

(400 * 1024 * 1024 bytes) / (3600 * 48 seconds) = 2427 bytes/sec

Obviously, you can just watch your system or use real numbers for the calculation, but using the rough estimate above you're logging about 2KB/sec, which is pretty trivial compared to the average hard-drive limits.

So, no, the overhead doesn't appear to be very big. And yes, there's more efficient ways to do it, but you would probably spend more time and effort that it's worth for the miniscule savings you get unless your numbers are very different from what you stated.

Nathan
Hi Nathan thanks for your answer, just for curiosity and maybe for future users that see the question can you provide some guideline if it is possible about more efficient ways to do this.
Eduardo
"Efficient" is a bit of a vague term, but in general you're trying to do more with less resources. So typically you focus on bottlenecks. Writing too much to disk? Log less. Or log in a binary format that uses less space. Or don't write to the disk. Write to a ramdisk, or a network drive, etc.
Nathan
Also pay attention to unused resources. Is your process cpu-bound with little memory usage? Maybe store all your logs in memory until the task completes. Is the processes memory-intensive, but light on the cpu? Pipe the logging through a cpu-intensive compression tool before writing to disk.
Nathan
+1  A: 

You can put data in STL vector and made some profiling at your data, like :
- exclude repeated lines;
- save only differences;
- flush data after a few time;
- select specific data to save;
- etc...

lsalamon