views:

56

answers:

1

This relates to some software I've been given to "fix". The easiest and quickest solution would make it open and read 10 random files out of hundreds and extract some very short strings for processing and immediately close them. Another process may come along right after that and do the same thing to different, or the same, random files and this may occur hundreds of times in a few seconds.

I know modern operating systems keep those files in memory to a point so disk thrashing isn't an issue as in the past but I'm looking for any articles or discussions about how to determine when all this open/closing of many random files becomes a problem.

A: 

When your working set (the amount of data read by all your processes) exceeds your available RAM, your throughput will tend towards the I/O capacity of your underlying disk.

From your description of the workload, seek times will be more of a problem than data transfer rates.

When your working set size stays below the amount of RAM you have, the OS will keep all data cached and won't need to go to the disk after having its caches filled.

David Schmitt