views:

61

answers:

2

I have a Ruby program that takes about 4 minutes to complete task and I'd like to get it down to under 1 minute.

I tried ruby-prof from gem but enabling it increases running times to ~30 minutes, and doesn't even seem to preserve monotonicity particularly well (some changes reliably improve performance-with-profiler and as reliably deteriorate performance-without-profiler). This task also cannot really be broken down into parts that can be meaningfully profiled independently.

What's the current best way to profile Ruby code with lowest overhead?

I use OSX, but if for any reason profiler requires another OS, I might reboot.

EDIT: perftools.rb has much lower overhead, but results look rather dubious to be honest, way outside any reasonable sampling error - at the very least it must be messing with GC or i/o buffering or something like that, causing a lot of silly misattributions. It still beats ruby-prof.

I'll keep the question open in case someone knows anything better than that.

+4  A: 

I don't think you can do any better with either MRI or YARV.

Rubinius, however, has a built-in profiler (just call with -Xprofile) with much less overhead.

And with JRuby, you get the whole array of Java tooling, which includes some of the best profilers ever created. Even without specific support for JRuby, those tools can be quite helpful. Oracle JDK has this extremely cool VisualVM tool, which lets you visualize all sorts of stuff about your program (and AFAIK there's even a JRuby plugin for it). Oracle JRockit also has a great profiler. The Azul JVM is rumored to have an absolutely mindblowingly awesome profiler. I think J9 has a great one, too. And of course, there's YourKit.

Charles Oliver Nutter and other members of the JRuby community have recently written a series of articles on comprehending runtime behavior of Ruby code using JRuby. Mostly, those articles were written as a reaction to the memprof library for MRI and thus they tend to focus on memory profiling, but there is also some stuff about call profiling in there.

AFAIK, one of the goals for MacRuby was to be able to use XCode's runtime comprehension stuff (Instruments and Co.) for Ruby, but that's more of a long-term goal, I don't know if this is currently implemented.

Here's a little example from Rubinius:

rbx -Xprofile -e'
  Hash.new {|fibs, n|
    fibs[n] = if n < 2 then n else fibs[n-1] + fibs[n-2] end
  }[100]
'

Which prints:

Total running time: 0.009895000000000001s

  %   cumulative   self                self     total
 time   seconds   seconds      calls  ms/call  ms/call  name
------------------------------------------------------------
   7.59    0.00      0.00        234     0.00     0.01  Hash#find_entry
   5.86    0.00      0.00        419     0.00     0.00  Hash#key_index
   5.49    0.00      0.00        275     0.00     0.00  Hash::Entry#match?
   4.97    0.01      0.00        234     0.00     0.02  Hash#[]

As you can see, one interesting property of the Rubinius profiler is that, since it can profile arbitrary Ruby code, and Rubinius itself is mostly Ruby code, it can profile deeply into the system itself.

Jörg W Mittag
+2  A: 

Any profiler that gives you self-time, reports at the function level, thinks accuracy and efficiency are important, and gives you a call graph, is based on the same set of concepts as in the original gprof, with minor variations. ruby_prof is just one of numerous examples.

Here's why that is not good.

Here's a method that actually finds problems, so you can make your code run faster, and you don't have to buy or install anything.

Here's an example of using it to get a big speedup.

Mike Dunlavey
I understand the idea of sampling-based profiling, but `perftools.rb` which does just that has massive sampling artifacts, it's not anything like random in practice.
taw
@taw: I've tried my best to explain how not all sampling profilers are alike. There's an enormous difference between PC sampling and sampling the whole stack, between sampling during IO and not, function-level and line-level stats, fussing about recursion in the graph and not. And, the whole concept that profiling needs to be fast and/or accurate has been swallowed without question. I've tried my best to show that we need to take a hard look at those basics, and not just follow popular generalities.
Mike Dunlavey
It is whole-stack, and function level. Profiling has problem with I/O and GC as these things get work accumulated and function most likely to trigger I/O or GC flush (recorded by profiler) is not one that generated all this work. ("With the exception of other ways of requesting work to be done, such as by message posting" covers far more than in C era).
taw
@taw: Here's my prescription: It should sample the whole stack, and not disable sampling at any time. If GC is considered noise, those can be discarded. Samples during IO are very valuable, because often the IO, when you know why it's being done, can be reduced by a lot. The percentage cost of a line or function is the percent of samples it appears on. If you have function-level cost, you still have to hunt for the costly lines, but they're in plain view on the samples, so no need to hunt. There are multiple problems, those that are easy to find and those that are not. ...
Mike Dunlavey
@taw: ... Maybe in languages like Ruby they specifically try not to record, as in a stack, the reason why the code is currently being executed. Is that the case? If so, then the language designer has intentionally made performance tuning difficult, because the *only* way to speed up software is to find out what's taking much time *unnecessarily*. If you don't know *why* it's being done, you don't know if you can replace it, and the lines on the stack tell you why. The object is to *find* that code, not accurately measure other things to multiple decimal places.
Mike Dunlavey
@taw: There a those who try it, and those who would rather debate. If there were a good reason why it doesn't work, those who try it would say so: http://stackoverflow.com/questions/2624667/whats-a-very-easy-c-profiler-vc/2624725#2624725
Mike Dunlavey
@taw: I'm not the only one who's realized this. Here are some links to people who did it entirely on their own, including two profiler vendors: http://stackoverflow.com/questions/266373/one-could-use-a-profiler-but-why-not-just-halt-the-program/317160#317160http://stackoverflow.com/questions/2473666/tips-for-optimizing-c-net-programs/2474118#2474118http://www.rotateright.com/http://www.lw-tech.com/http://stackoverflow.com/questions/266373/one-could-use-a-profiler-but-why-not-just-halt-the-program/270705#270705
Mike Dunlavey