views:

80

answers:

3

I have a script with a main for loop that repeats about 15k times. In this loop it queries a local MySQL database and does a SVN update on a local repository. I placed the SVN repository in a RAMdisk as before most of the time seemed to be spent reading/writing to disk.

Now I have a script that runs at basically the same speed but CPU utilization for that script never goes over 10%.

ProcessExplorer shows that mysqld is also not taking almost any CPU time or reading/writing a lot to disk.

What steps would you take to figure out where the bottleneck is?

+1  A: 

Profile your Python code. That will show you how long each function/method call takes. If that's the method call querying the MySQL database, you'll have a clue where to look. But it also may be something else. In any case, profiling is the usual approach to solve such problems.

Eli Bendersky
I learned how to Profile thanks to you but the results I got were inconclusive (or I still don't know how to interpret them).
greye
+4  A: 

Doing SQL queries in a for loop 15k times is a bottleneck in every language..

Is there any reason you query every time again ? If you do a single query before the for loop and then loop over the resultset and the SVN part, you will see a dramatic increase in speed.

But I doubt that you will get a higher CPU usage. The reason is that you are not doing calculations, but mostly IO. Btw, you can't measure that in mysqld cpu usage, as it's in the actual code not complexity of the queries, but their count and the latency of the server engine to answer. So you will see only very short, not expensive queries, that do sum up in time, though.

Homer J. Simpson
+1  A: 

It is "well known", so to speak, that svn update waits up to a whole second after it has finished running, so that file modification timestamps get "in the past" (since many filesystems don't have a timestamp granularity finer than one second). You can find more information about it by Googling for "svn sleep_for_timestamps".

I don't have any obvious solution to suggest. If this is really performance critical you could either: 1) not update as often as you are doing 2) try to use a lower-level Subversion API (good luck).

Antoine P.