views:

168

answers:

5

I am trying to measure the time of raw_queries(...), unsuccessfully so far. I found that I should use the timeit module. The problem is that I can't (= I don't know how) pass the arguments to the function from the environment.

Important note: Before calling raw_queries, we have to execute phase2() (environment initialization).

Side note: The code is in Python 3.

def raw_queries(queries, nlp):
    """ Submit queries without getting visual response """

    for q in queries:
        nlp.query(q)

def evaluate_queries(queries, nlp):
    """ Measure the time that the queries need to return their results """

    t = Timer("raw_queries(queries, nlp)", "?????")
    print(t.timeit())

def phase2():
    """ Load dictionary to memory and subsequently submit queries """

    # prepare Linguistic Processor to submit it the queries
    all_files = get_files()
    b = LinguisticProcessor(all_files)
    b.loadDictionary()

    # load the queries
    queries_file = 'queries.txt'
    queries = load_queries(queries_file)

if __name__ == '__main__':
    phase2()

Thanks for any help.

UPDATE: We can call phase2() using the second argument of Timer. The problem is that we need the arguments (queries, nlp) from the environment.

UPDATE: The best solution so far, with unutbu's help (only what has changed):

def evaluate_queries():
    """ Measure the time that the queries need to return their results """

    t = Timer("main.raw_queries(queries, nlp)", "import main;\
        (queries,nlp)=main.phase2()")

    sf = 'Execution time: {} ms'
    print(sf.format(t.timeit(number=1000)))


def phase2():
    ...

    return queries, b


def main():
    evaluate_queries()

if __name__ == '__main__':
    main()
+1  A: 

I'm not sure about this, I've never used it, but from what I've read it should be something like this:

....
t = Timer("raw_queries(queries, nlp)", "from __main__ import raw_queries")
print t.timeit()

I took this from http://docs.python.org/library/timeit.html (if this helps).

Alex
I have already tried it. It doesn't work because queries and nlp are not in the environment of the execution.I use Python3 if that makes any difference.Also, be aware that I have to call phase2() before measuring the time of raw_queries().Thanks for replying.
myle
+1  A: 

Custom timer function may be a solution:

import time

def timer(fun,*args):
    start = time.time()
    ret = fun(*args)
    end = time.time()
    return (ret, end-start)

Using like this:

>>> from math import sin
>>> timer(sin, 0.5)
(0.47942553860420301, 6.9141387939453125e-06)

It means that sin returned 0.479... and it took 6.9e-6 seconds. Make sure your functions run long enough if you want to obtain reliable numbers (not like in the example above).

jetxee
Thanks for replying. Unfortunately, this is not the case. raw_query is supposed to run in really short time. We could overcome this problem by using a loop, but official documentation strongly suggests the use of timeit for some reasons.
myle
+1  A: 

normally, you would use timeit.
examples are here and here.

also Note:

By default, timeit() temporarily turns off garbage collection during the timing. The advantage of this approach is that it makes independent timings more comparable. This disadvantage is that GC may be an important component of the performance of the function being measured

... or you can write your own custom timer using the time module.

if you go with a custom timer, remember that you should use time.clock() on windows and time.time() on other platforms. (timeit chooses internally)

import sys
import time

# choose timer to use
if sys.platform.startswith('win'):
    default_timer = time.clock
else:
    default_timer = time.time

start = default_timer()
# do something
finish = default_timer()
elapsed = (finish - start)
Corey Goldberg
Thanks corey (btw I follow you on twitter (dimle)).I would prefer to use the timeit module but I can't find anywhere an example of a similar situation as the one I am dealing with (which I find quite a common case).
myle
myle, i just added a link to http://diveintopython.org/performance_tuning/timeit.html does that help?
Corey Goldberg
Having already read it. It didn't help enough (how do I pass arguments or how do I initialize the environment?) timeit creates a new isolated environment.
myle
+2  A: 

First, never use the time module to time functions. It can easily lead to wrong conclusions. See http://stackoverflow.com/questions/1622943/timeit-versus-timing-decorator for an example.

The easiest way to time a function call is to use IPython's %timeit command. There, you simply start an interactive IPython session, call phase2(), define queries, and then run

%timeit raw_queries(queries,nlp)

The second easiest way that I know to use timeit is to call it from the command-line:

python -mtimeit -s"import test; queries=test.phase2()" "test.raw_queries(queries)"

(In the command above, I assume the script is called test.py)

The idiom here is

python -mtimeit -s"SETUP_COMMANDS" "COMMAND_TO_BE_TIMED"

To be able to pass queries to the raw_queries function call, you have to define the queries variable. In the code you posted queries is defined in phase2(), but only locally. So to setup queries as a global variable, you need to do something like have phase2 return queries:

def phase2():
    ...
    return queries

If you don't want to mess up phase2 this way, create a dummy function:

def phase3():
    # Do stuff like phase2() but return queries
    return queries
unutbu
Thanks! That worked like a charm: python3 -mtimeit -s"import main; (queries,b)=main.phase2()" "main.raw_queries(queries,b)"
myle
A: 

You don't say so, but are you by any chance trying to make the code go faster? If so, I suggest you not focus in on a particular routine and try to time it. Even if you get a number, it won't really tell you what to fix. If you can pause the program under the IDE several times and examine it's state, including the call stack, it will tell you what is taking the time and why. Here is a link that gives a brief explanation of how and why it works.*

*When you follow the link, you may have to go the the bottom of the previous page of answers. SO is having trouble following a link to an answer.

Mike Dunlavey
Thanks Mike for your suggestion. Your guess is true. I was trying to optimize raw_queries. That's why I was concentrated on it. I am positive that's where I should concentrate on.
myle
@myle: Well, what I was trying to suggest is that you may have a guess that **raw_queries** is where the problem is, but you should put that guess aside, because stack samples will tell you where the problem is and you should believe them, regardless of your prior guess (which might or **might not** be what you need to fix).
Mike Dunlavey
I see your point. In other words you suggest the use of a profiler before optimization. Your point is valid. But in this case, I am only interested in raw_queries due to the nature of the project. The whole project is built around it (I was building something like a small search engine for educational purposes and queries are some lookups).
myle
@myle: Even so, take stack samples. If you are right, they will tell you. If your concern is misplaced, they will tell you. If you are right about raw_queries, they will tell you **why**, and they will tell you what to fix to make it faster. Just measuring time gives you a number, but if your goal is to **reduce** that number you need something that tells you what to fix. That's what stack samples do. If you find this concept hard to grasp, you're not alone.
Mike Dunlavey