views:

480

answers:

4
+6  A: 

All data is cached in buffers before being effectively layed in the physical disk. A buffer from the system, another inside the disk itself (a 32MB buffer probably). While you fill those buffers, your program run at full speed and 100% CPU. Once the buffers are full, your program get to wait for the disk, which is much, much slower than memory and buffers, and this wait makes you stop consuming all this CPU.

Maybe you can make your code "wait for the disk" from the start, by using some Perl equivalent to fflush().

Havenard
I expect there to be file buffers. But not several GB in size (?)
Peter Mortensen
On linux systems buffers are usually configured to spread to nearly all free RAM.
Pavel Shved
He doesn't use Linux...
Workshop Alex
+4  A: 

Maybe the OS is writing to disk as fast as it can (85 MB/s), and is putting the excess 35 MB/s in a buffer, and when it fills, is pausing the app to flush the buffer. Since the buffer is drained at 85 MB/s, you'd expect it to take 35/85 = ~0.4 times as long to drain as to fill. That's broadly compatible with your graph, if i squint enough.

You can estimate the size of the buffer as the product of the pause time and the disk speed.

Tom Anderson
+3  A: 

Look at the graph! The green line indicates the average disk queue length. At one moment, it gets a peak and the CPU goes to 0 afterwards. IO Writes also goes to 0. It goes back to normal until a second peak is shown. Then CPU and IO writes return to normal. Then both IO and CPU drop again, to go up again at the next Queue peak. And again down, then up again...

It could be that the disk is doing the physical writes at that moment. However, it could also be that the system is doing a disk validation at that moment, reading the dat it just wrote to validate the writes, making sure the data is written correctly.

Another thing I notice is the 2.7 GB size. Since you're running this on a Windows system, I become a bit suspicious since that's about the amount of memory that Windows can handle, as a 32-bits process. The 64-bits Windows will provide the application up to 3 GB of RAM (a bit less) but then it needs to release it again. You might want to use Process Explorer to check the amount of RAM in use and the amount of IO reads.

And perhaps use a 64-bits Perl version...

Workshop Alex
Regarding 2.7 GB: I don't know if more than 3 GB is possible, but it can happen already at 1 GB. For instance just before I wrote this I ran it again and the first phase ended at 1.2 GB (somewhere in between 1139 MB and 1273 MB).
Peter Mortensen
What do you mean by amount of RAM? Amount for the Perl process? "Private bytes" for the Perl process stays constant at 4 MB during the run. About 6.3 GB RAM is free when the script is started.
Peter Mortensen
I just tried another run. This time the first phase ended at approximately 4.3 GB (somewhere in between 4.19 GB and 4.41 GB [4288.3 MB; 4513.7 MB]). Here is a transcript of the run: http://www.pil.sdu.dk/1/until2039-12-31/PerlPerfTranscript_2009-09-07b.txt
Peter Mortensen
I will try to install the 64 bit version of Perl from ActiveState and test it.
Peter Mortensen
The problem with Process Explorer is that the system becomes so unresponsive that there are no screen updates in Process Explorer. Performance Monitor also stops updating and I don't know if it actually samples correctly during the unresponsive period.
Peter Mortensen
A 32-bits process won't be able to use more than 3 GB of Windows. One GB is always reserved for Windows and part of the memory will be used by Perl itself, plus some data. It could be that some add-in/plugin is allocating this RAM without reporting this to your graph. It does seem like it's just filling it's own in-memory buffer first before writing it to disk, although the disk seems to report write IO.
Workshop Alex
The 64-bits version, if available, might be more optimal for a 64-bits system. There's no guarantee that it will behave better but if it's something that uses the RAM for some buffering or whatever, you would at least have a much bigger buffer since the 64-bits version will be able to use all, while the 32-bits version is limited...
Workshop Alex
I have tried the 64 bit version now (see updated question). It is definitely better. But it would be good to known exactly why.
Peter Mortensen
Well, the 64-bits version is able to use much more memory than the 32-bits version. It could be that there's some kind of buffering going on somewhere in RAM. As a test, disable the swap file. Or all swap files if you have multiple! It's not practical but the effect might indicate a memory-related issue. (Turn them back on after the test!)
Workshop Alex
+3  A: 

I am with everyone else who is saying that the problem is buffers filling and then emptying. Try turning on autoflush to avoid having a buffer (in Perl):

#!/usr/bin/perl

use strict;
use warnings;

use IO::Handle;

my $filename = "output.txt";

open my $numbers_outfile, ">", $filename
    or die "could not open $filename: $!";

$numbers_outfile->autoflush(1);

#each time through the loop should be 1 gig
for (1 .. 20) {
    #each time though the loop should be 1 meg
    for (1 .. 1024) {
        #print 1 meg of Zs
        print {$numbers_outfile} "Z" x (1024*1024)
    }
}

Buffers can be good if you are going to print a little, do so work, print a litte, do some work, etc. But if you are just going to be blasting data onto disk, they can cause odd behavior. You may also need to disable any write caching your filesystem is doing.

Chas. Owens
Thanks. I have now tried 64 bit Perl (see updated question), but the next step will be to try turning on autoflush.
Peter Mortensen
Remember, you may also need to modify your filesystem if it is keeping buffers around.
Chas. Owens
autoflush will make a system call after every print element. In your example performance will be good because it is 1 MB at a time. But if you print 'a', 'b', 'c', 'd' it will be very bad because that is four system calls of one char each...watch out for that.
Zan Lynx