views:

204

answers:

6

Simultaneous Or Sequential write operation-- Does it matter in terms of speed?

With multicore processor, does it make sense to parallelize all the file write operation using multi thread, just to get a boost of speed? Of course, all those write operations are independent.

+2  A: 

That depends on the disks and their controller. Do they have TCQ/NCQ? Is it RAID? If so that might make some sense. With one regular SATA disk w/o NCQ, it won't.

vartec
+2  A: 

Write the simplest code first, and see whether that performs well enough with the target environment. (Different disks, operating system versions, CPUs, drivers etc may well affect the result significantly.)

If the simplest correct code isn't fast enough, then it makes sense to try to work out faster ways of performing IO. At a guess, it might make sense to parallelize the write operations if you're writing to different disks, but possibly not otherwise. That's only a complete guess though.

Purely by coincidence, I'm planning to benchmark a related situation soon. I have a blog post describing the tests I intend to perform, and will update the entry with a link to results when I've got some. It's not quite the same as what you're describing, but close enough to perhaps be of interest.

Jon Skeet
This is all true, but I don't see the relevance to the question. Perhaps the OP already knows the simplest-code-first mantra, but he wants to glean some general knowledge about computer architecture from the code perspective.
jhs
The question asked whether or not it makes sense to use a complicated technique. I think my answer of essentially "It doesn't make sense if your app performs well enough already" is extremely relevant. The OP certainly hasn't indicated that he *does* have a performance problem.
Jon Skeet
+3  A: 

Generally, no.

As of now, the physical write to disk IS the bottle neck by some orders of magnitude, and it is in most scenarios rather sequential. Parallelizing writes you have good chances to worsen performance by incurring seeks. Sequential reads and writes will largely outperform interleaving n most cases.

Per-disk parallelization (TCQ and NCQ) mainly work by reducing the seeks that are naturally required when different clients concurrently request data from different sections of the disk. If you can avoid these seeks in the first place, you are better off.

I some scenarios - RAID 1, JBOD or when different streams of data arrive rather slowly - the right scheduling can improve your throughput, but that requires intimate knowledge of the hardware at hand, and other processes not spoiling your fun.


At best, you can leave that as a decision to the end user (e.g. give an option to turn it off), and provide performance measures to guide him. (You might even prove me wrong ;))

peterchen
Mmm. If the disk has command queuing and re-ordering, then the head in practise will basically swing from the left extreme to the right extreme, collecting data as it goes. Parallel writes in this situation improve throughput considerably since there is no real seek overhead.
Blank Xavier
Only if the application needs significant time to prepare the data. Usually, the physical write to the disk is the bottleneck by a factor of 10 or more, swinging the head around won't make it faster.
peterchen
A: 

If you are talking about writing to one file, the answer is no. You can't parallelize writing to one file since every process or thread has to acquire a lock for the file from the OS to do writes.

Otherwize this has to depend on the hardware controllers and type of storage, the OS kernel and filesystem implementation.

Vasil
I didn't vote you down, but of course you can parallelize writes to a single file. You just have to have the different threads writing to different parts of the file.
Eddie
A: 

Technically, you can mmap a file and have multiple threads write to it, but the disk will probably still create a bottleneck.

If you need maximize I/O throughput, a starting point would be to investigate the asynchronous I/O your environment supports.

HUAGHAGUAH
A: 

This is a simple question, but the answer can be really really complicated. Les try to narrow down the scenario with some assumptions: The OS is Windows, you have a relatively large number of writes that are truly independent.

  1. You can skip the multi-threading by simply issuing the writes asynchronously.
  2. Issue them all at once - let the OS schedule the writes
  3. It doesn't matter if the writes are to the same file or to different files. Note, this is only true if the above assumption about the writes being independent is true.

Worst case, this won' be any slower than a single plain old every day disk on a parallel ATA controller: it will be slow.

Best case, the OS can schedule the writes very efficiency. This would be true in the case of a storage system with lots of spindles, or with a disk that supports NCQ.

The key thing to remember here is that disk I/O (in general) isn't CPU bound, so going out of your way to use multi-core won't help you; it will just make life complex.

Note, you can help things if you order the writes so they are sequential in a file (overall) or sequential on the disk by sorting them by their extent.

Foredecker