views:

128

answers:

4

I'm tempted to relabel this question 'Look at this brick. What type of house does it belong to?'

Here's the situation: I've effectively been asked to profile some subroutines having access to neither profilers (even Devel::DProf) nor Time::HiRes. The purpose of this exercise is to 'locate' bottlenecks.

At the moment, I'm sprinkling print statements at the beginning and end of each sub that log entries and exits to file, along with the result of the time function. Not ideal, but it's the best I can go by given the circumstances. At the very least it'll allow me to see how many times each sub is called.

The code is running under Unix. The closest thing I see to my need is perlfaq8, but that doesn't seem to help (I don't know how to make a syscall, and am wondering if it'll affect the code timing unpredictably).

Not your typical everyday SO question...

+7  A: 

This technique should work.

Basically, the idea is if you run Perl with the -d flag, it goes into the debugger. Then, when you run the program, ctrl-Break or ctrl-C should cause it to pause in the middle of whatever it is doing. Then you can type T to show the stack, and examine any other variables if you like, before continuing it.

Do this about 10 or 20 times. Any line of code (or any function, if you prefer) costing a significant percent of time will appear on that percent of stack samples, roughly, so you will not miss it.

For example, if a line of code (typically a function call) costs 20% of time, and you pause the program 20 times, you will see that line on 4 stack samples, give or take 1.8 samples. The amount of time that could be saved if you could avoid executing that line, or execute it a lot less, is a 20% reduction in overall execution time.

Then you can repeat it to find more problems.

You said the purpose is to 'locate' bottlenecks. This method does exactly that. Measuring function execution time is only a very indirect way to do that.

Mike Dunlavey
@Zaid: Just in case you are tempted to respond with a comment saying that's too much work, give it a try. You may be surprised at how easy and effective it is.
Mike Dunlavey
@Mike: Point well taken, but my superiors want times. I have a pretty good gut-feeling as to where the bottlenecks and excessive dwell times lie, but I need to quantify it.
Zaid
@Zaid: To get time, take the overall execution time (stopwatch) and multiply by the percent. If they want *precise* time they're missing the point. If you have a pretty good gut-feeling where the problems are, sampling will either prove you right, or tell you where the problem *really* is.
Mike Dunlavey
Maybe: run the program with -d asynchronously from another program and attach the to-be-profiled program's STDIN/STDOUT/STDERR to file handles. Then in the outer program, send INT signals to the one being profiled, parse the stuff printed to STDOUT/ERR, and write "r\n" to the STDIN of the program-being-profiled. Then do statistics based on the parsed back traces. It's probably annoying to get right, but might work.
tsee
@tsee: That should work. The statistics you want are, for each line of code, the percent of stack traces containing it. (If a line appears more than once on a trace, you only count the first appearance. That's because you need to ignore recursion.) Personally, I like to study individual stack traces and understand the *what* and *why* of what the program was doing at that instant. That's how I know if I can get rid of it.
Mike Dunlavey
@Mike, also, the automated gathering of any useful amount of statistics this way is likely to take forever. Zaid's requirements are just silly (not saying it's his fault). By the time he has a working profiler that gives him the stats he needs, he will have *rewritten* a profiling module. I mean, it's so silly that at some point, you could start using gdb to hook into perl, set breakpoints on "entersub" and walk the perl call stack yourself in C. Or compile your own Perl with the profiling code baked in...
tsee
@tsee: I don't even think of it as statistics. If I take 10 stack samples, I just look for some line of code that appears on >1 sample and write down on paper a bit of a call tree descending from that line, as indented text, or do it in a text editor. Next to each node I put a count of how many samples that LOC is seen in, like "(3)". That means the line costs 30% of total time, so it is a juicy target. I look to see if it is something I can shoot out. If there is a problem, and there usually is, I know this will find it. Zoom does this in a fancy way, but I don't need fancy.
Mike Dunlavey
+1  A: 
  1. As far as syscall, there's a pretty good example in this post: http://www.cpan.org/scripts/date_and_time/gettimeofday

    I think it's clear enough even for someone who never used syscall before (like myself :)

  2. May I ask what the specifics of "having no access" are?

    It's usually possible to get access to CPAN modules, even in cases where installing them in central location is not in the cards. Is there a problem with downloading the module? Installing it in your home directory? Using software with the module incuded?

    If one of those is a hang-up it can probably be fixed... if it's some company policy, that's priceless :(

DVK
@DVK: Long story short: this is not for personal use. NDA
Zaid
@Zaid - you can't fight The Man :(
DVK
@Zaid - I added an example of using syscall - see if that's something you can re-use...
DVK
A: 

Well, you can write your own profiler. It's not as bad as it sounds. A profiler is just a very special-case debugger. You want to read the perldebguts man page for some good first-cut code to get started if you must write your own.

What you want, and what your boss wants, though he or she may not know it, is to use Devel::NYTProf to do a really good job of profiling your code, and getting the job done instead of having to wait for you to partially duplicate the functions of it while learning how it is done.

The comment you made about "personal use" doesn't make sense. You're doing a job for work, and the work needs to get done, and you need (or your manager needs to get you) the resources to do that work. "Personal use" doesn't seem to enter into it.

Is it a question of someone else refusing to sign off on the module to have it installed on the machine running the software to be measured? Is it a licensing question? Is it not being allowed to install arbitrary software on a production machine (understandable, but there's got to be some way the software's tested before it goes live - I hope - profile it there)?

What is the reason that a well-known module from a trustworthy source can't be used? Have you made the money case to your manager that more money will be spent coding a new, less-functional, profiler from scratch than finding a way to use one that is both good and already available?

Joe McMahon
I completely agree that `Devel::NYTProf` would make this task a walk in the park. I wish I could say more, but my hands (and tongue) are tied. Thanks anyway.
Zaid
A: 

For each subroutine, create a wrapper around it which reports the time in some format which you can export to something like R, a database, Excel or something similar (CSV would be a good choice). Add something like this to your code. If you are using a Perl less than 5.7 (when Time::HiRes was first added to core), use syscall as mentioned above instead of Time::HiRes's functions below.

INIT {
    sub wrap_sub {
        no strict 'refs';
        my $sub = shift;
        my $subref = *{$sub}{CODE};
        return sub {
            local *__ANON__ = "wrapped_$sub";
            my $fsecs = Time::HiRes::gettimeofday();
            print STDERR "$sub,$fsecs,";
            if (wantarray) {
               @return = eval { $subref->(@_) } or die $@;
            } else {
               $return[0] = eval { $subref->(@_) } or die $@;
            }

            $fsecs = Time::HiRes::gettimeofday();
            print STDERR "$fsecs\n";
            return wantarray ? @return : $return[0];
        };
    }
    require Time::HiRes;
    my @subs = qw{the subs you want to profile};
    no strict 'refs';
    no warnings 'redefine';
    foreach my $sub (@subs) {
        *{$sub} = wrap_sub($sub);
    }    
}

Replace 'subs you want to profile' with the subs you need profiled, and use an open()ed file handle instead of STDERR if you need to, bearing in mind you can get the results of the run separate from the output of the script (on Unix, with the bourne, korn and bash shells), like this

perl ./myscript.pl 2>myscript.profile
MkV