views:

710

answers:

3

I am trying to start using erlang:trace/3 and the dbg module to trace the behaviour of a live production system without taking the server down.

The documentation is opaque (to put it mildly) and there don't appear to be any useful tutorials online.

What I spent all day trying to do was capture what was happening in a particular function by trying to apply a trace to module:function using dbg:c and dbg:p but with no success at all...

Does anyone have a succinct explanation of how to use trace in a live Erlang system?

+5  A: 

The basic steps of tracing for function calls are on a non-live node:

> dbg:start().   % start dbg
> dbg:tracer().  % start a simple tracer process
> dbg:tp(Module, Function, Arity, []).   % specify MFA you are interested in
> dbg:p(all, c).   % trace calls (c) of that MFA for all processes.

... trace here

> dbg:stop_clear().   % stop tracer and clear effect of tp and p calls.

You can trace for multiple functions at the same time. Add functions by calling tp for each function. If you want to trace for non-exported functions, you need to call tpl. To remove functions, call ctp or ctpl in a similar manner. Some general tp calls are:

> dbg:tpl(Module, '_', []).  % all calls in Module
> dbg:tpl(Module, Function, '_', []).   % all calls to Module:Function with any arity.
> dbg:tpl(Module, Function, Arity, []). % all calls to Module:Function/Arity.
> dbg:tpl(M, F, A, [{'_', [], [{return_trace}]}]).   % same as before, but also show return value.

The last argument is a match specification. You can play around with that by using dbg:fun2ms.

You can select the processes to trace on with the call to p(). The items are described under erlang:trace. Some calls are:

> dbg:p(all, c).   % trace calls to selected functions by all functions
> dbg:p(new, c).   % trace calls by processes spawned from now on
> dbg:p(Pid, c).   % trace calls by given process
> dbg:p(Pid, [c, m]).  % trace calls and messages of a given process

I guess you will never need to directly call erlang:trace, as dbg does pretty much everything for you.

A golden rule for a live node is to generate only an amount of trace output to the shell, which lets you to type in dbg:stop_clear().. :)

I often use a tracer that will auto-stop itself after a number of events. For example:

dbg:tracer(process, {fun (_,100) -> dbg:stop_clear();
                        (Msg, N) -> io:format("~p~n", [Msg]), N+1 end, 0
                    }).

If you are looking for debugging on remote nodes (or multiple nodes), search for pan, eper, inviso or onviso.

Zed
I haven't been able to get trace to send *anything* to the shell so far :(
Gordon Guthrie
You say those are the basic steps on a non-live node... presumably because on a live node you would just get into a scroll-off with the shell (and lose)?
Gordon Guthrie
Tracing requires some resources from the system. If you trace for frequent events you might risk your performance. In the worst case you will lose contact, and it goes -Boom-.
Zed
On a live system I would be very specific about the function calls, including some match on arguments. Also most of the times you can limit tracing to one process instead of all.
Zed
You can actually just use dbg for cluster/remote node debugging. `dbg:n(Node)` sets up `Node` for the same tracing as the node you're on.
archaelus
My other general adice would be: You can use call tracing (c flag) with all processes if you setup a trace pattern for one of your own modules (system/trace processes won't be calling it), but only use message (send/receive/message flag) with a small set of processes (that doesn't include any system/trace processes).
archaelus
+1  A: 

On live systems we rarely trace to shell. If the system is well configured then it is already collecting your Erlang logs that were printed to the shell. I need not emphasize why this is crucial in any live node...

Let me elaborate on tracing to files:

It is possible to trace to file, which will produce a binary output that can be converted and parsed later. (for further analysis or automated controlling system, etc.)

An example could be:

  • Trace to multiple files wrapped (12x50 Mbytes).Please always check the available disk space before using such a big trace!

    dbg:tracer(port,dbg:trace_port(file,{"/log/trace",wrap,atom_to_list(node()),50000000,12})).
    

    dbg:p(all,[call,timestamp,return_to]).

    • Always test on a test node before entering anything to a live node's shell!
    • It is most advised to have a test node or replica node to try the sripts first.

That said let's have a look at a basic tracing command sequence:

<1> dbg:stop_clear().

  • Always start by flushing trace ports and ensuring that no previous tracing interferes with the current trace.

<2> dbg:tracer().

  • Start the tracer process.

<3> dbg:p(all,[call, timestamp]).

  • In this case we are tracing for all processes and for function calls.

<4> dbg:tp( ... ).

  • As seen in Zed's answer.

<5> dbg:tpl( ... ).

  • As seen in Zed's answer.

<42> dbg:stop_clear().

  • Again it is to ensure that all traces were written to the output and to evade any later inconvenience.

You can:

  • add triggers by defining some fun()-s in the shell to stop the trace at a given time or event. Recursive fun()-s are the best to achieve this, but be very careful when applying those.

  • apply a vast variety of pattern matching to ensure that you only trace for the specific process with the specific function call with the specific type of arguments...

I had an issue a while back, when we had to check the content of an ETS table and on appearance of a certain entry we had to stop the trace within 2-3 minutes.

I also suggest the book Erlang Programming written by Francesco Cesarini. (Erlang Programming @ Amazon)

Dlf
+2  A: 

The 'dbg' module is quite low-level stuff. There are two hacks that I use very frequently for the tasks that I commonly need.

  1. Use the Erlang CLI/shell expansion code at http://www.snookles.com/erlang/user_default.erl. It was originally written (as far as I know) by Serge Aleynikov and has been a useful "so that's how I add custom functions to the shell" example. Compile the module and edit your ~/.erlang file to point to its path (see comment at the top of the file).

  2. Use the "redbug" utility that's bundled with in the EPER collection of utilities. It's very easy to use 'dbg' to create millions of trace events in a few seconds. Doing so in a production environment can be disastrous. For development or production use, redbug makes it nearly impossible to kill a running system with a trace-induced overload. !@#$@! anti-spam measures ... go to Google Code and search for the "eper" project for the code.

Scott Lystig Fritchie