ansaurus

Question

Profile a C shared library called by Ruby program

Answer 1

A:

You can run the ruby interpreter itself through the profiler. If that is too much, write a small C program that loads the shared library and calls its exported functions. Then profile that C program. It saves you from maintaining two versions of the library.

Vijay Mathew 2010-01-12 07:30:57

Yeah, I guess I can profile the ruby interpreter itself - except maybe I'll have to go through a lot of cruft I am not interested in. I think it's possible to profile just the shared libraries itself, since I googled and found references suggesting that's possible. Except I didn't get an authoritative link telling me exactly how to do it.

Sudhanshu 2010-01-12 07:57:20

Answer 2

+1 A:

I found oprofile to be a lot better option than gprof for profiling, in this situation. The reports from oprofile are much more comprehensive. I compiled out the ruby portions which were causing the seg-faults (not all of them were), from the C extension using #ifndef PROFILE, and replaced them with non-ruby code. I wrote a main() routine within the extension itself, to call the functions in the extension. Then I set up a makefile to compile the extension as a C program with PROFILE defined. Then I installed oprofile on Ubuntu. Wrote this script.

#!/bin/bash
sudo opcontrol --reset
sudo opcontrol --start
./a.out Rome Damascus NewYork Delhi Bangalore
sudo opcontrol --shutdown
opreport -lt1

Compiled my program, and executed the above script, which gives output like this from the "opreport" command:

...
...
Killing daemon.
warning: /no-vmlinux could not be found.
warning: [vdso] (tgid:10675 range:0x920000-0x921000) could not be found.
warning: [vdso] (tgid:1270 range:0xba1000-0xba2000) could not be found.
warning: [vdso] (tgid:1675 range:0x973000-0x974000) could not be found.
warning: [vdso] (tgid:1711 range:0x264000-0x265000) could not be found.
warning: [vdso] (tgid:1737 range:0x990000-0x991000) could not be found.
warning: [vdso] (tgid:2477 range:0xa53000-0xa54000) could not be found.
warning: [vdso] (tgid:5658 range:0x7ae000-0x7af000) could not be found.
CPU: Core Solo / Duo, speed 1000 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Unhalted clock cycles) with a unit mask of 0x00 (Unhalted core cycles) count 100000
samples  %        app name                 symbol name
12731    32.8949  a.out                    levenshtein
11958    30.8976  a.out                    corpora_pass2
5231     13.5161  no-vmlinux               /no-vmlinux
4021     10.3896  a.out                    corpora_pass1
1733      4.4778  libc-2.10.1.so           /lib/tls/i686/cmov/libc-2.10.1.so
542       1.4004  ld-2.10.1.so             /lib/ld-2.10.1.so
398       1.0284  a.out                    method_top_matches

There it is: the top consumer is the function levenshtein(). I followed this by another command to generate disassembled output annotated with source code and execution count/time of each line. This looks like this (counts/times are on the left of each executed line):

> opannotate --source --assembly ./a.out > report.as.handcoded.1
> cat report.as.handcoded.1

...
...
...
           :         __asm__ (
 2  0.0069 : 804918a:       mov    -0x50(%ebp),%ecx
 4  0.0137 : 804918d:       mov    -0x54(%ebp),%ebx
           : 8049190:       mov    -0x4c(%ebp),%eax
12  0.0412 : 8049193:       cmp    %eax,%ecx
10  0.0344 : 8049195:       cmovbe %ecx,%eax
 8  0.0275 : 8049198:       cmp    %eax,%ebx
11  0.0378 : 804919a:       cmovbe %ebx,%eax
16  0.0550 : 804919d:       mov    %eax,-0x4c(%ebp)
           :                   "cmp     %0, %2\n\t"
           :                   "cmovbe  %2, %0\n\t"
           :                  : "+r"(a) :
           :                    "%r"(b), "r"(c)
           :                  );
           :          return a;
 ...
 ...
 ...

Sudhanshu 2010-01-12 10:08:03

Answer 3

+1 A:

You can do better than gprof. Consider stackshots. You can do it using pstack, lsstack (if you can get it), or by manually pausing under the debugger. Here's a short intro to the technique.

Mike Dunlavey 2010-01-12 13:11:35

Mike, isn't the per instruction count + time taken, given out by the opannotate program more or less the same technique (see my own answer here), that you have described there?

Sudhanshu 2010-01-12 14:23:08

@Sudhanshu: I just looked at the oprofile doc, and I'm not sure, but it looks like it's close to what I think is needed. It does sample the call stack, but I can't tell for sure if it samples on wall-clock time. (Some profilers don't sample during system calls, and that's useless.) Then if the number it shows on a call-site (or an ordinary instruction) is the fraction of samples containing that call site, then it's doing the right thing. Another profiler that does this is RotateRight/Zoom.

Mike Dunlavey 2010-01-12 14:43:02

... the other thing it needs to do is let you control when it samples and when not (like when it's waiting for user input).

Mike Dunlavey 2010-01-12 15:45:48

... I don't think counts are useful. The times are useful if what they mean is fraction-of-samples. Then I want to see fractions of at least 0.1 or 0.05. In my experience, problems worth fixing tend to cost > 10%.

Mike Dunlavey 2010-01-12 15:54:14

... I like to see the actual stack samples, rather than just summary numbers, because then I can form a verbal description of exactly what it's doing and why, and tell what fraction of the time it's doing it. I also like to include, if necessary, other state information as well (like what *was* the string being looked up). That tells me if it was really necessary.

Mike Dunlavey 2010-01-12 16:19:09

+1 @Mike: Your suggested methodology is good and I think oprofile more or less does that kind of sampling. Oprofile actually profiles the whole system (kernel + all user programs), although I can tell oprofile to output a report for a specific executable (in this case, my own program). The "no-vmlinux" in the opannotate is the time spent inside the kernel. However, in my program most amount of time is spent in the user-space. And the annotated disassembled output gives me time spent line by line.I can see where you're going with stack samples, and I agree it can give some good insights.

Sudhanshu 2010-01-12 17:28:50

@Sudhanshu: Thanks for making me look at oprofile. So now that's two profilers I'm aware of that do "the right thing" (more or less). BTW, if you take your own samples in the debugger, you can trace through the Ruby interpreted stack (with some work) if you want to explore the possibility that some of the Ruby code could be optimized.

Mike Dunlavey 2010-01-12 18:12:09

ansaurus

tags:

views:

answers:

Profile a C shared library called by Ruby program

related questions