views:

453

answers:

7

In Goetz's "Java Concurrency in Practice", in a footnote on page 101, he writes "For computational problems like this that do not I/O and access no shared data, Ncpu or Ncpu+1 threads yield optimal throughput; more threads do not help, and may in fact degrade performance..."

My question is, when performing I/O operations such as file writing, file reading, file deleting, etc, are there guidelines for the number of threads to use to achieve maximum performance? I understand this will be just a guide number, since disk speeds and a host of other factors play into this.

Still, I'm wondering: can 20 threads write 1000 separate files to disk faster than 4 threads can on a 4-cpu machine?

+2  A: 

Like all performance related things it depends.

If you're I/O bound, then adding threads won't help you at all. (Ok, as Steven Sudit points out, you might get an increase in performance, but it'll be small) If you're not I/O bound then adding threads may help

Not trying to be smart, but the best way to find out is to profile it and see what works for your particular circumstances.

Edit: Updated based on comments

Glen
I'm not going to downvote you, but as I explained in my answer, my experience differs from this.
Steven Sudit
No, I'm not talking about a small improvement. I'm talking three or four times faster on a dual-core processor.
Steven Sudit
The one thing we clearly agree on is that, in these matters, practice beats theory. Code it so it works both ways and see for yourself. I was surprised when I saw the magnitiude of the improvement.
Steven Sudit
Glen, re: profiling, I've been doing that, and what I've found so far is that the difference between 4 threads (on a 4-cpu machine) and 20 isn't that striking, but with, say, 100, the degradation is significant.
Adding threads past the point of diminishing returns will definitely slow things down. However, I'm talking about the difference between 1 thread and, say, 8. The overhead is small in comparison to the increased CPU and I/O utilization, so it's a net gain.
Steven Sudit
Adding more threads with I/O operations can let you queue things more efficiently and better hide latency.
Eric
@Eric: Yes, that's a good explanation for the benefits of pipelining I/O.
Steven Sudit
+4  A: 

In practice, I/O-bound applications can still benefit substantially from multithreading because it can be much faster to read or write a few files in parallel than sequentially. This is particularly the case where overall throughput is compromised by network latency. But it's also the case that one thread can be processing the last thing that it read while another thread is busy reading, allowing higher CPU utilization.

We can talk theory all day, but the right answer is to make the number of threads configurable. I think you'll find that increasing it past 1 will boost your speed, but there will also come a point of diminishing returns.

Steven Sudit
It sounds like the OP is talking about having all the thread doing the same operation, not one read and one process.
Bill the Lizard
Well, if each thread is reading and then processing a different file, you still get parallism.
Steven Sudit
My apologies for the confusion. I've edited the post to make it clear that I'm not talking about writing the same file with X different threads. I simply mean each thread writing a different file (but containing the same string, so we're comparing apples and apples)
@marc: Thanks for the clarification.
Bill the Lizard
+1 for the "make the number of threads configurable" part.
Martinho Fernandes
+2  A: 

Yes, 20 threads can definitely write to disk faster than 4 threads on a 4 CPU machine. Many real programs are I/O bound more than CPU bound. However, it depends in great detail on your disks and how much CPU work your other threads are doing before they, too, end up waiting on those disks.

If all of your threads are solely writing to disk and doing nothing else, then it may well be that 1 thread on a 4 CPU machine is actually the fastest way to write to disk. It depends entirely on how many disks you have, how much data you're writing, and how good your OS is at I/O scheduling. Your specific question suggests you want 4 threads all writing to the same file. That doesn't make much sense, and in any practical scenario I can't think how that'd be faster. (You'd have to allocate the file ahead of time, then each thread would seek() to a different position, and you'd end up just thrashing the write head as each thread tried to write some blocks.)

The advantage of multithreading is much simpler when your network bound. Ie: waiting on a database server, or a web browser, or the like. There you're waiting on multiple external resources.

Nelson
+1  A: 

Ncpu + expected # of concurrent IO activities is my usual number.

The key isn't that 20 threads can write a single file to disk faster than 4 threads. If you only have 1 thread per cpu, then while you are writing to disk your process will not be able to use the cpu hosting the thread that is doing the file IO. That CPU is effectively waiting for the file to be written, whereas if you have one more thread it can use the CPU to do real processing in the interim.

patros
Indeed. At the OS level, I/O is asynchronous, so making a synchronous call just means your thread will block. If no other threads are available to be scheduled, the CPU utilization will drop and you might think you're I/O-bound, even though you haven't reached the limits of your pipe.
Steven Sudit
+1  A: 

If you are using synchronous I/O, then you should have one thread for every simultaneous I/O request your machine can handle. In the case of a single spindle single hard disk, that's 1 (you can either read or write but not both simultaneuosly). For a disk that can handle many I/O requests simultaneously, that would be however many requests it can handle simultaneously.

In other words, this is not bounded by the CPU count, as I/O does not really hit the CPU beyond submitting requests and waiting. See here for a better explanation.

There's a whole other can of worms with how many I/O requests you should have in flight at any given time.

MSN
+2  A: 

See also http://stackoverflow.com/questions/1033065

UPDATE: I added a benchmark there.

RED SOFT ADAIR
Thanks for directing me there. Unfortunately, the accepted answer to that question is mistaken, and there's little we can do it about it other than add our comments.
Steven Sudit
Adding comments and voting our posts of course.
RED SOFT ADAIR
A: 

If the only thing you do with that threads is writing to the disk then your performance increase will be negligible or even harmful as usually drivers are optimized for sequential reads for hard drives so that you're transforming a sequential write in a file to several "random" writes.

Multithreading can only help you with I/O bound problems if the I/O is perform against different disks, different network cards or different database servers in terms of performance. Nontheless in terms of observed performance the difference can be much greater.

For example, imagine you're sending several files to a lot of different receivers through a network. You're still network bound so that your maximum speed won't be higher than say 100Mb/S but, if you use 20 threads then the process will be much more fair.

Jorge Córdoba
Due to latency, a single thread is not going to saturate a network card, but multiple threads can. In other words, there's a soft cap and a hard cap.
Steven Sudit
I have found one thread has no problem saturating a 1 Gb/s network card. In fact a single thread can pump about 3-4 Gb/s over loop back for relatively small message sizes. I haven't tried a 10 Gb/s network card but I hoping to get my hands on a few in about a month.
Peter Lawrey
@Peter: Latency is not the same as bandwidth.
Steven Sudit