views:

172

answers:

5

Hello!

Once I had the theory that on modern operating systems multithreaded read access on the HDD should perform better.

I thought that:
the operating system queues all read requests, and rearranges them in such a way, that it could read from the HDD more sequentially. The more requests it would get, the better it could rearrange them to optimize the read sequence.
I was very sure that I read it somewhere few times.

But I did some benchmarking, and had to find out, that multithreaded read access mostly perform much worst, and never performs better.

I had the experience under Windows and Linux. I benchmarked pure searching of files using the operating system's tools, and also had written own little benchmarks.

Am I missing something?
Can someone explain to me the secrets of this topic?
Thank you!

+2  A: 

Well apparently you're causing the read head to skip around all over the place. Your bottleneck is the disk, not the processor.

To re-phrase, the CPU might be parrallel but the disk isn't.

SpliFF
Well, as I said, I thought the operating system would optimize it, when it would know from the start which read operations are requested.
ivan_ivanovich_ivanoff
but what I'm saying is you can only put commands into the queue at a speed limited by the disk controller. Unless you perform suplementary operations between reads you are still limited by the IO pipeline, not CPU.
SpliFF
Doing sequential reads keeps requests near each other and allows burst mode and other optimisations. If you performing random access from many threads you are more likely to break the linear nature of your reads. Command reording is more of a SCSI thing anyway so it may be your disk doesn't do it or that the reordering process itself is a bottleneck.
SpliFF
I understand what you mean about the OS doing the reordering but I have no information on the specifics. I suspect it's largely left to the disk controller to decide.
SpliFF
+1  A: 

Whether or not you are seeing speedup will almost assuredly depend on the scenario you are looking at and the hardware. More details on your benchmarking methodology would be useful here.

At a coarse level, the opportunity for a speedup arises when you're not utilizing the maximum throughput of the i/o controller and it's caches or when you are overlapping i/o with CPU intensive work and they are blocked waiting for each other.

Are you comparing doing reads of multiple small files spread out across the system, or just reading a few large files sequentially? You'll see different performance characteristics here.

Have you profiled with a good systems profiler like the (free) windows performance toolkit to see what is going on in your benchmarks? This is practically a must.

These kind of benchmarks can be a lot of fun to write and profile, don't let a few false starts get in the way of digging in and looking for speedups.

-Rick

Rick
+1  A: 

I think your assumption about the OS optimizing concurrent disk access is simply false. I imagine it does this sort of re-ordering when you use scatter/gather I/O from a single thread, but there's no practical way for it to optimize concurrent requests in this way. Any such scheme would introduce unnecessary latency in single-threaded reads. (The OS would have to wait a bit just in case a concurrent request came in.) Anyway, the short answer is that your concurrent requests are causing the read heads to jump all over the place. The OS cannot optimize this away.

Peter Ruderman
+1  A: 

I think you are talking about native command queuing, which may or may not be enabled on the system you are testing with. From the Wikipedia entry:

In fact, newer mainstream Linux kernels support AHCI natively. Windows XP requires the installation of a vendor-specific driver even if AHCI is present on the host bus adapter. Windows Vista natively supports both AHCI and NCQ. FreeBSD fully supports AHCI and NCQ since version 8.0.

Also, I haven't done any tests, but NCQ may not be that effective for a directory walk that has to access small files/inodes all over the disk. It could be that the disk controller is able to service each request fast enough that a queue is never built up to reorder, thus you don't see any benefit.

AngerClown
+1  A: 

It's probably important here that you split the reading of the directory or file information away from the processing of that information. In other words, disk IO in one thread, processing and searching in another. Pass completed IO information to the processing thread with a bounded queue. By doing this you'll ensure that your IO thread is never waiting on the processing of results before getting busy on the read of the next block of data to process.

Ross Judson