views:

1020

answers:

9

What kinds of tools do you use to determine the efficiency of code? Do you use home grown applications that run a statistically significant number of tests, or some commercial product? Do you use your own knowledge to test certain areas of your code, or some tool that analyzes your code for weak spots?

A: 

For .NET I would recommend NDepend.

Vadim
I thought NDepend was more for static code analysis. It now has runtime performance instrumentation?
MikeJ
@MikeJ, you're right NDepend does analysis for static code. I don't that the question is specific to runtime. BTW, at my work we use ANTS Performance Profiler (http://www.red-gate.com/products/ants_performance_profiler/index.htm) for runtime analysis.
Vadim
+14  A: 

This is called profiling. There are lots of off-the-shelf tools available to help you determine where the bottlenecks in your applications are, for all kinds of different languages. For example, the TPTP set of tools for Java can show you where performance bottlenecks are down to the individual method level, if you want it to. Of course, sometimes all you really need is a couple of reads of the system timer to get a general idea about a section of code.

Mike Daniels
+1  A: 

Profilers are very useful for seeing which code you spend the most time in. There are many profiling tools out there, usually they are specific to the platform/development environment that you are in.

For small cases I have used simple timers in the code (system time at end of action - system time at start of action).

One important rule: don't ever assume that the performance optimization you just put in will actually run faster. Always verify!

onedozenbagels
+1  A: 

I do use tools for that job. Otherwise I found it quite hard to do it by my own hands ... (well it's my opinion, I never tried actually)

On Linux I use Valgrind which is delivered with some useful tools to profile your code. As for the Valgrind home page:

It runs on the following platforms: X86/Linux, AMD64/Linux, PPC32/Linux, PPC64/Linux.

.. and it's free (as in free beer) and open source.

While I've not used them that much, on another platform you may use Purify/Quantify (IBM products). They are commercial tools. As Will reported it, they are delivered in the PurifyPlus package, and Quantify will help you in profiling your application.

That's all I've used... :-)

yves Baumes
I think Purify is tied with Quantify, which is a profiler. I'm not certain though, but they are both parts of PurifyPlus.
Will Mc
Good Point. Thnx Will! I update the question with your comment.
yves Baumes
+1  A: 

Honestly, I use NUnit. If I have code that is taking too long or doesn't seem to scale, I write a test simulating the part of the app that is performing poorly. Then I look at making strenth substitutions in the code to reduce the run-time and verify that I didnt break anything.

If this still isnt giving you what you need, then you need to find and apply a profiler, but you at least will have a test case that you can try out your assumptions on without having to open/load the application, track calls to elements of the application that arent consequential to the task at hand.

MikeJ
You can use MbUnit that gives exact time how long it took to execute each test. However, for runtime analysis ANTS Performance Profiler is much more efficient than any unit test framework.
Vadim
Yes. I agree that you need both. One to identify the hotspots and one to assure that changes to reduce computational effort do not result in incorrect operation.
MikeJ
A: 

The question is ambiguous. Efficient from a runtime or memory standpoint? The context in which a particular piece of code is run, the architecture of the app and the data that might get thrown at it are all factors too.

BTW, Someone mentioned Purify; there's also its sister product Quantify.

no-op
+9  A: 

INSERTED: Let's look at "statistical significance".

Assume there's a function call instruction somewhere. You can't necessarily see it - a class, or a macro, or the compiler, may have slipped it in. There are other calls to the same function nearby, but this call is in a loop, or its arguments are such as to make this call take a long time. So much time in fact, that if this call could be made to take zero time, then total execution time would be reduced by some amount, say 90%. (Impossible? Not at all.) Would timing pinpoint it? No. Would a call graph pinpoint it? No. Would call counts pinpoint it? No. Because the problem is not at the function level, but at the call instruction level.

Somehow the program is randomly stopped at a point in time, and its state is examined. Will it have stopped in that 90% of the time that would be saved if the instruction could be "zeroed"? Of course - with 90% probability, and the instruction will be pinpointed on the stack waiting for its "work" to be completed.

In fact, if you stop it randomly 20 times, that instruction will be on the stack an average of 18 times, with a standard deviation of +/- 1.3 times.

Is that statistically significant? You bet it is.
Did you need a large number of samples? You bet you didn't.

Suppose the percentage is small, like 10% or 5%. The same principle applies.

In fact, no matter how few samples are taken, any instruction that is on >1 sample is statistically significant, and is a "hot spot", "bottleneck", or whatever you want to call it. If you could remove it, call it less, or somehow reduce it, it will save significant time. (Some you can't, like "call _main", but others you can. You just need to find them.)

Of course my code would never be so stupid, would it? Well then, prove it.

OK, now back to the story . . .

ORIGINAL ANSWER: I had heard of profilers, way back when, and I thought they must be pretty neat, but I didn't have access to them/it. I was working on an embedded processor (an 8086 intel chip) that seemed to be taking an awful long time painting some floating-point numbers on a display screen. The hardware guys suggested providing from their plenty - adding timer chips so I could see how long things were taking. However, one weekend I fired it up with an Intel "Blue Box" in-circuit emulator and got it running. While it was being slow, I wondered "What the heck is it doing?". So I just halted it to find out. The PC was in the floating-point library (there was no FP chip). That was no surprise, since it was painting floating-point numbers, but I wanted to know more. So I (laboriously) read the hex memory so as to follow the call stack. Guess what? It was in the process of taking the number to be painted, dividing it by 10, converting to integer, converting back to float, subtracting, and so on, just to get the next digit to paint. Needless to say, there were better ways to do that, resulting in a speedup of around 10 times. That was found with one (1) sample!

On another occasion, on a 68K chip, there was some slowness. Again, a profiler was not available, but debugger "adb" was, so while it was being slow, I halted it a few times. The PC was in the math library, actually in the 32-bit integer multiply routine. Looking up the stack, I found this code:

struct {...} a[...];

int i;
for (i = 0; i < ...; ++i){ ... a[i] ... }

There's no call to multiply in there - what's going on? Turns out, for a[i], the compiler has to multiply i by the array element size. Since i is 32 bits (in that compiler) it generates a call to the 32-bit multiply routine, and the stack pinpoints that call instruction. By declaring i as short, the loop tripled in speed!

What's the point? If you take samples at random times while the program's being slow, the PC will tell you what it's doing, but the call stack will tell you why, and lead you straight to the problem. Now, unless the problem is really bad, you may need to take multiple samples. Any statement that shows up on >1 sample is one you can be suspicious of. Notice, it pinpoints statements, instructions even, not big chunks of code like functions. This technique may be "quick and dirty", but it is very effective.

ADDED: If you do this repeatedly, you can solve problem after problem in the same software. For example, if you get a speedup of 3 times, small performance problems that may have been insignificant before now consume 3 times more of the remaining time. That makes them much easier to hit with samples. You may have to add a temporary outer loop to make it run long enough to sample. In this way, I've seen compounded speedup factors of more than 40 times.

Mike Dunlavey
This technique is certainly effective. I discovered it this week through another Stack Overflow post, http://stackoverflow.com/questions/375913, and tried it on one of my applications used by many internal and external users. The result was an overall speed-up of 7 times (!) in the first iteration - from 16.5 minutes to 2.2 minutes on one particular (typical) dataset.
Peter Mortensen
@Peter: How many stack samples did you have to take in order to find the problem? If you got a 7 times speedup, that implies each sample had (at least) a 6/7 probability of showing it.
Mike Dunlavey
@Mike Dunlavey: I only needed 3-4. It was already clear by then. In the next iteration I took 20 and it gave me the idea for a change that led a speed-up of 3.3 times for another mode of operation (in fact by far the most timeconsuming of the entire application) in my application (http://msquant.sourceforge.net/).
Peter Mortensen
@Peter: That looks like a really interesting product.
Mike Dunlavey
+1  A: 

I use Valgrind and its tool Callgrind. It's great tool. Valgrind is basically an Virtual Machine:

Valgrind is in essence a virtual machine using just-in-time (JIT) compilation techniques, including dynamic recompilation. Nothing from the original program ever gets run directly on the host processor. Instead, Valgrind first translates the program into a temporary, simpler form called Intermediate Representation (IR), which is a processor-neutral, SSA-based form. After the conversion, a tool (see below) is free to do whatever transformations it would like on the IR, before Valgrind translates the IR back into machine code and lets the host processor run it. Even though it could use dynamic translation (that is, the host and target processors are from different architectures), it doesn't. Valgrind recompiles binary code to run on host and target (or simulated) CPUs of the same architecture.

Callgrind is a profiler build upon that. Main benefit is that you don't have to run your aplication for hours to get reliable result. Few seconds are sufficient, because Callgrind is a non-probing profiler,

Another tool build upon Valgrind is Massif. I use it to profile heap memory usage. It works great, it gives you snapshots of memory usage -- detailed information WHAT holds WHAT percentage of memory, and WHO had put it there.

One more Valgrind tool -- DRD (and Helgrind). I use them to track dead locks and data races in my code, as well as thread API abuse.

A: 

I've heard good word about Yourkit for Java profiling. For web sites, I've tried JMeter which has basic functionality for simple parametrized profiling.

egaga