views:

760

answers:

5

I have some threaded code using PThreads on Linux that, I suspect, is suffering from excessive lock contention. What tools are available for me to measure this?

Solaris has DTrace and plockstat. Is there something similar on Linux? (I know about a recent DTrace port for Linux but it doesn't seem to be ready for prime time yet.)

+1  A: 

In the absence of DTrace, your best bet is probably SystemTap. Here's a positive write up.

http://davidcarterca.wordpress.com/2009/05/27/systemtap/

ire_and_curses
I'm afraid that blog post explicitely mentions that it wasn't very useful for measuring lock contention.
Luís Oliveira
In comments he says he didn't try resolving symbols though.
Eugene
Good point, I'll give it a try.
Luís Oliveira
+1  A: 

After not having much luck with SystemTap, I decided to try and use the DTrace Linux port with some success, despite the lack of a plockstat provider. The following DTrace script is not quite a plockstat replacement but it managed to show me some of the information I was after.

#!/usr/sbin/dtrace -s 

/* Usage: ./futex.d '"execname"' */

long total;

END
{
    printf("total time spent on futex(): %ldms\n", total);
}

/* arg1 == 0 means FUTEX_WAIT */
syscall::futex:entry
/execname == $1 && arg1 == 0/
{
    self->start = timestamp;
}

syscall::futex:return
/self->start/
{
    this->elapsed = (timestamp - self->start) / 1000000;
    @[execname] = quantize(this->elapsed);
    total += this->elapsed;
    self->start = 0;
}

Here's an example using the above DTrace script to measure time spent in FUTEX_WAIT for a simple test program from this DTrace article.

$ ./futex.d '"mutex-test"'
dtrace: script './futex.d' matched 3 probes
^C
CPU     ID                    FUNCTION:NAME
  1      2                             :END total time spent on futex(): 11200ms


  mutex-test                                        
           value  ------------- Distribution ------------- count    
             128 |                                         0        
             256 |@@@@@@@@@@@@@@@@@@@@                     1        
             512 |                                         0        
            1024 |                                         0        
            2048 |                                         0        
            4096 |                                         0        
            8192 |@@@@@@@@@@@@@@@@@@@@                     1        
           16384 |                                         0

Definitely not great, but at least it's a starting point.

Luís Oliveira
+1  A: 

valgrind latest versions has a lock contention and lock validation tools:

http://valgrind.org/docs/manual/drd-manual.html

Which is great if you can produce the issue under Valgrind (it effects code run time speed) and have enough memory to run Valgrind.

For other uses, the more hard core Linux Trace Toolkit NG is recommended:

http://ltt.polymtl.ca/

Cheers, Gilad

gby
Thanks for your suggestion but valgrind really doesn't like the application I'm testing and barfs really quickly.
Luís Oliveira
A: 

The latest version of systemtap comes with lots of example scripts. One in particular seems like it would server as a good starting point for helping you accomplish your task:

#! /usr/bin/env stap

global thread_thislock 
global thread_blocktime 
global FUTEX_WAIT = 0

global lock_waits
global process_names

probe syscall.futex {  
  if (op != FUTEX_WAIT) next
  t = tid ()
  process_names[pid()] = execname()
  thread_thislock[t] = $uaddr
  thread_blocktime[t] = gettimeofday_us()
}

probe syscall.futex.return {  
  t = tid()
  ts = thread_blocktime[t]
  if (ts) {
    elapsed = gettimeofday_us() - ts
    lock_waits[pid(), thread_thislock[t]] <<< elapsed
    delete thread_blocktime[t]
    delete thread_thislock[t]
  }
}

probe end {
  foreach ([pid+, lock] in lock_waits) 
    printf ("%s[%d] lock %p contended %d times, %d avg us\n",
            process_names[pid], pid, lock, @count(lock_waits[pid,lock]),
            @avg(lock_waits[pid,lock]))
}

I was attempting to diagnose something similar with a MySQL process previously and observed output similar to the following using the above script:

mysqld[3991] lock 0x000000000a1589e0 contended 45 times, 3 avg us
mysqld[3991] lock 0x000000004ad289d0 contended 1 times, 3 avg us

While the above script collects information on all processes running on the system, it would be quite easy to modify it to only work on a certain process or executable. For example, we could change the script to take a process ID argument and modify the probe on entering the futex call to look like:

probe begin {
  process_id = strtol(@1, 10)
}

probe syscall.futex {
  if (pid() == process_id && op == FUTEX_WAIT) {
    t = tid ()
    process_names[process_id] = execname()
    thread_thislock[t] = $uaddr
    thread_blocktime[t] = gettimeofday_us()
  }
}

Obviously, you could modify the script lots of ways to suit what you want to do. I'd encourage you to have a look at the various example scripts for SystemTap. They are probably the best starting point.

posulliv
A: 

mutrace is the tool: http://0pointer.de/blog/projects/mutrace.html

Its easy to build, install and use.