views:

27

answers:

1

I try to simulate application load to measure application performance. Dozens of clients send requests to server and significant part of request processing is random data loaded from HDD (random file, random file offset).

I use 15 Gb in 400 files.

HDD does its best to cache reading operations so overall performance is very unstable from run to run (+/- 5..10%).

In order to minimize HDD-internals optimizations I am thinking to put data to dedicated physical HDD, create random files before every test run, use the same random file access sequence (sequence of files and offsets), then run a test and format HDD at the end. I suppose it will clear all internal HDD caches and file access predictions.

What shall I do to minimize performance result dispersion? It there a simpler (or may be more appropriate) way to get performance results stable?

Thank you in advance!

+2  A: 

Essentially all modern hard drives do include caching. It seems to me that results without a cache might be more uniform, but would be uniformly meaningless.

In any case, there are commands to disable caching on most drives (but, if memory serves, they're probably extensions, not part of the standard, so you'd have to implement them specifically for a particular target drive).

OTOH, given that you want to simulate something that isn't how a real hard drive (normally) works, I'd consider writing it as a complete software simulation -- e.g., have some sort of hard-drive class that kept a "current track", with commands to read and write data, seek to another track, etc. The class would keep track of things like the amount of (virtual) time consumed for each operation.

Jerry Coffin
Thank you for the reply. I don't want to disable caching, but make caching the same from run to run. I suspect that for the second test run (the third, the fourth ...) hdd caches more and more data application uses so tests tends to execute faster and faster. Some random file access sequences are executed much faster than others. But I am afraid to use the same file access sequence because it will be completely cached.
Andrew Florko