views:

109

answers:

3

I'm adapting an application that makes heavy use of generators to produce its results to provide a web.py web interface.

So far, I could wrap the call to the for-loop and the output-producing statements in a function and call that using cProfile.run() or runctx(). Conceptually:

def output():
    for value in generator():
        print(value)

cProfile.run('output()')

In web.py, I have to wrap it the following way, since I want to immediately produce output from the potentially long-running computation in each iteration step using yield:

class index:
    def GET(self):
        for value in generator():
            yield make_pretty_html(value)

Is there a way to profile all calls to the generator like in the first example when it's used like in the second one?

A: 

Can you just use time.time() to profile the parts you are interested in? Just get the current time and subtract from the last time you made a measurement.

karpathy
I already get the total timing, but "Processing took 5.382 seconds" isn't specific enough for finding performance bottlenecks. I use a rather large and branching chain of generators internally and intended to store user input and resulting performance for later analysis. I have several functions that take an average 0.000 seconds each call but might be called tens of thousands of times.
Daniel Beck
In that case you can have an integer counter for every execution of those functions and only do a measurement every 1000th execution? You could measure chunks of code this way instead and narrow down the bottlenecks. I realize it may be a bit tedious, depending on the code.
karpathy
+1  A: 

It sounds like you're trying to profile each call to 'next' on the generator? If so, you could wrap your generator in a profiling generator. Something like this, where the commented off part will be sending the results to a log or database.

def iter_profiler(itr):
  itr = iter(itr)
  while True:
    try:
      start = time.time()
      value = itr.next()
      end = time.time()
    except StopIteration:
      break
    # do something with (end - stop) times here
    yield value

Then instead of instantiating your generator as generator() you would use iter_profiler(generator())

wybiral
alternatively, you could apply a modified version of it as a decorator at the definition of the generator.
aaronasterling
Basically, you're right. I already do count the milliseconds between before and after the `for` loop in my original post's second example, so I already have this information (although I count a bit more than you would, but a few millis don't matter). However, I care more for the "hot spots" in my code (where does computing time go to die?) than exactly which of the dozens of results took longer than the others. Therefore I was hoping for a `profile/cProfile` based solution (that does not generate one report per result). If there's a way to merge profiler reports, that would be the way to go.
Daniel Beck
A: 

I finally found a solution. Return value of profiling via here.

import cProfile
import pstats
import glob
import math

def gen():
    for i in range(1, 10):
        yield math.factorial(i)

class index(object):
    def GET(self):
        p = cProfile.Profile()

        it = gen()
        while True:
            try:
                nxt = p.runcall(next, it)
            except StopIteration:
                break
            print nxt

        p.print_stats()

index().GET()

I also could merge multiple such profiling results (once I start giving unique file names) via documentation and store/analyze them combined.

Daniel Beck