views:

75

answers:

3

I'm using xdebug with PHP to do some performance profiling. But when I run the same script more than once, I often get very different times. So it's hard to know how much faith to put in the results.

Obviously there's a lot happening on a machine that can affect PHP's performance. But is there anything I can do to reduce the number of variables, so multiple tests are more consistent?

I'm running PHP under Apache, on Mac OS X.

+4  A: 
  1. Reduce the number of unrelated services on the box as much as possible.
  2. Cut down on the number of Apache processes.
  3. Prime the various caches by loading your script a few times. Possibly use a benchmarking tool like Apache's ab or siege, to make sure all Apache children are hit.
  4. Profile your script from the command line using curl or wget so that Apache only serves one resource: the script itself.

There may be an argument for getting more "real world" numbers by omitting some of these steps. I look forward to other answers this question may receive.

Adam Backstrom
+1  A: 
  1. As others have said, reduce the running services and programs to a minimum
  2. Run your test multiple times in succession and average to account for outliers
  3. Make sure caching of any sort is disabled (unless you specifically want to test it with a cache)
  4. If the results still vary widely, the problem is most likely in your profiling code. It might have some racing conditions or depends on network connections. You will get more details if you provide the code
  5. You also might be hitting some bottlenecks on some of the runs. If you profile carefully different parts of the scripts you might be able to catch it.
Eran Galperin
+3  A: 

There are two different tasks, measuring performance and finding problems.

For measuring the time it takes, you should expect variability, because it depends on what else is going on in the machine. That's normal.

For finding problems, what you need to know is the percent of time used by various activities. Percent doesn't change too much as a function of other things, and the exact value of the percent doesn't matter much anyway.

What matters is that you find activities responsible for a healthy percent, that you can fix, and then that you fix them. When you do, you can expect to save time up to that percent, but the finding is what you need to do. The measuring is secondary.

Added: You might want to ask "Don't you have to measure in order to find?" Consider an example. Suppose you run your program with debugging turned on, and you randomly pause it, and you see it in the process of closing a log file. You continue it, and then pause it again, and see the same thing. Well that rough "measurement" says it's spending 100% of its time doing that. Naturally, the time spent doing it isn't really 100%, but whatever it is, it's big, and you've found it. So then maybe you don't have to open/close the file so often, or something. Typically, more samples are needed, but not too many.

Mike Dunlavey