views:

687

answers:

12

On a modern system can local harddisk write speeds be improved by compressing the output stream?

This question derives from a case I'm working with where a program serially generates and dumps around 1-2GB of text logging data to a raw text file on the hard disk and I think it is IO bound. Would I expect to be able to decrease runtimes by compressing the data before it goes to disk or would the overhead of compression eat up any gain I could get? Would having an idle second core effect this?

I know this would be effected by how much CPU is being used to generate the data so rules of thumb on how much idle cpu time would be needed would be good.


I recall a video talk where someone used compression to improve read speeds for a database but IIRC compressing is a lot more CPU intensive than decompressing.

+1  A: 

If it's just text, then compression could definitely help. Just choose an compression algorithm and settings that make the compression cheap. "gzip" is cheaper than "bzip2" and both have parameters that you can tweak to favor speed or compression ratio.

Joachim Sauer
A: 

This used to be something that could improve performance in quite a few applications way back when. I'd guess that today it's less likely to pay off, but it might in your specific circumstance, particularly if the data you're logging is easily compressible,

However, as Shog9 commented:

Rules of thumb aren't going to help you here. It's your disk, your CPU, and your data. Set up a test case and measure throughput and CPU load with and without compression - see if it's worth the tradeoff.

Michael Burr
interesting. I guess as soon as you can't keep the compression data set in cache, it starts depending on the speed of your RAM vs. HDD. Also the HDD is a serial push pipe so it shouldn't suffer from latency issues as much. Modern perf tuning just gets wacky.
BCS
+1  A: 

This depends on lots of factors and I don't think there is one correct answer. It comes down to this:

Can you compress the raw data faster than the raw write performance of your disk times the compression ratio you are achieving (or the multiple in speed you are trying to get) given the CPU bandwidth you have available to dedicate to this purpose?

Given today's relatively high data write rates in the 10's of MBytes/second this is a pretty high hurdle to get over. To the point of some of the other answers, you would likely have to have easily compressible data and would just have to benchmark it with some test of reasonableness type experiments and find out.

Relative to a specific opinion (guess!?) to the point about additional cores. If you thread up the compression of the data and keep the core(s) fed - with the high compression ratio of text, it is likely such a technique would bear some fruit. But this is just a guess. In a single threaded application alternating between disk writes and compression operations, it seems much less likely to me.

Tall Jeff
+2  A: 

CPUs have grown faster at a faster rate than hard drive access. Even back in the 80's a many compressed files could be read off the disk and uncompressed in less time than it took to read the original (uncompressed) file. That will not have changed.

Generally though, these days the compression/de-compression is handled at a lower level than you would be writing, for example in a database I/O layer.

As to the usefulness of a second core only counts if the system will be also doing a significant number of other things - and your program would have to be multi-threaded to take advantage of the additional CPU.

Alister Bulman
Unless the OS is compressing the disk, I *am* the lowest level (edited question to reflect this). regarding 2 cores, I was thinking about having the second core do all the compression.
BCS
+4  A: 

Yes, this has been true for at least 10 years. There are operating-systems papers about it. I think Chris Small may have worked on some of them.

For speed, gzip/zlib compression on lower quality levels is pretty fast; if that's not fast enough you can try FastLZ. A quick way to use an extra core is just to use popen(3) to send output through gzip.

Norman Ramsey
I wonder what the overhead for the popen trick is.
BCS
The overhead is a fork and exec, which is fairly significant, so if only performance matters, it's probably worth doing only on a multicore system. On the other hand, the ease of programming is unparalleled, so as a quick hack it's worth doing on any system.
Norman Ramsey
I was thinking the IPC overhead. What does it cost to stuff 1GB of data down a '|'?
BCS
I don't know how well I/O libraries are optimized these days. An OS expert would know. My guess is two copies: from userspace to kernelspace and back. Since compression requires touching every byte anyway, the copying overhead is probably insignificant. Don't know about the process overhead.
Norman Ramsey
+1  A: 

Logging the data in binary form may be a quick improvement. You'll write less to the disk and the CPU will spend less time converting numbers to text. It may not be useful if people are going to be reading the logs, but they won't be able to read compressed logs either.

Mark James
good point. OTOH most of my reading is done in a diff tool to compare different runs (at what point does run A differ from B?) so I'd need a diff tool that can read inside a compressed stream. The binary format could alos be diffed but it would be trickier.
BCS
+4  A: 

Yes, yes, yes, absolutely.

Look at it this way: take your maximum contiguous disk write speed in megabytes per second. (Go ahead and measure it, time a huge fwrite or something.) Let's say 100mb/s. Now take your CPU speed in megahertz; let's say 3Ghz = 3000mhz. Divide the CPU speed by the disk write speed. That's the number of cycles that the CPU is spending idle, that you can spend per byte on compression. In this case 3000/100 = 30 cycles per byte.

If you had an algorithm that could compress your data by 25% for an effective 125mb/s write speed, you would have 24 cycles per byte to run it in and it would basically be free because the CPU wouldn't be doing anything else anyway while waiting for the disk to churn. 24 cycles per byte = 3072 cycles per 128-byte cache line, easily achieved.

We do this all the time when reading optical media.

If you have an idle second core it's even easier. Just hand off the log buffer to that core's thread and it can take as long as it likes to compress the data since it's not doing anything else! The only tricky bit is you want to actually have a ring of buffers so that you don't have the producer thread (the one making the log) waiting on a mutex for a buffer that the consumer thread (the one writing it to disk) is holding.

Crashworks
+3  A: 

For what it is worth Sun's filesystem ZFS has the ability to have on the fly compression enabled to decrease the amount of disk IO without a significant increase in overhead as an example of this in practice.

Chealion
+2  A: 

Windows already supports File Compression in NTFS, so all you have to do is to set the "Compressed" flag in the file attributes. You can then measure if it was worth it or not.

codymanix
+2  A: 

The Filesystems and storage lab from Stony Brook published a rather extensive performance (and energy) evaluation on file data compression on server systems at IBM's SYSTOR systems research conference this year: paper at ACM Digital Library, presentation.

The results depend on the

  • used compression algorithm and settings,
  • the file workload and
  • the characteristics of your machine.

For example, in the measurements from the paper, using a textual workload and a server environment using lzop with low compression effort are faster than plain write, but bzip and gz aren't.

In your specific setting, you should try it out and measure. It really might improve performance, but it is not always the case.

dmeister
+1  A: 

If you are I/O bound saving human-readable text to the hard drive, I expect compression to reduce your total runtime.

If you have an idle 2 GHz core, and a relatively fast 100 MB/s streaming hard drive, halving the net logging time requires at least 2:1 compression and no more than roughly 10 CPU cycles per uncompressed byte for the compressor to ponder the data. With a dual-pipe processor, that's (very roughly) 20 instructions per byte.

I see that LZRW1-A (one of the fastest compression algorithms) uses 10 to 20 instructions per byte, and compresses typical English text about 2:1. At the upper end (20 instructions per byte), you're right on the edge between IO bound and CPU bound. At the middle and lower end, you're still IO bound, so there is a a few cycles available (not much) for a slightly more sophisticated compressor to ponder the data a little longer.

If you have a more typical non-top-of-the-line hard drive, or the hard drive is slower for some other reason (fragmentation, other multitasking processes using the disk, etc.) then you have even more time for a more sophisticated compressor to ponder the data.

You might consider setting up a compressed partition, saving the data to that partition (letting the device driver compress it), and comparing the speed to your original speed. That may take less time and be less likely to introduce new bugs than changing your program and linking in a compression algorithm.

I see a list of compressed file systems based on FUSE, and I hear that NTFS also supports compressed partitions.

David Cary
So for a fast HDD and a so-so CPU, saturating the disk with compressed data will fully utilize about one core. So if my disk isn't that good or my CPU is somewhat higher end or if I can offload the compression to an otherwise unused core, I can increase amount of uncompressed data dealt with and still have some CPU available to generate the data in the first place.
BCS
Yes, this rough estimate tells me that saturating a fast HDD with compressed text will use somewhere in the range of half to all of a 2 GHz core, doubling the amount of uncompressed text dealt with.
David Cary
A: 

If this particular machine is often IO bound, another way to speed it up is to install a RAID array. That would give a speedup to every program and every kind of data (even incompressible data).

For example, the popular RAID 1+0 configuration with 4 total disks gives a speedup of nearly 2x.

The nearly as popular RAID 5 configuration, with same 4 total disks, gives all a speedup of nearly 3x.

It is relatively straightforward to set up a RAID array with a speed 8x the speed of a single drive.

High compression ratios, on the other hand, are apparently not so straightforward. Compression of "merely" 6.30 to one would give you a cash prize for breaking the current world record for compression (Hutter Prize).

David Cary