ansaurus

Question

Best way to watch process (and sub-processes) for file system read() I/O?

Answer 1

+5 A:

Try "Process Monitor" (procmon.exe) It allows to specify a filter (the name of the process to watch). It'll then list all the files and operations on said files.

On Linux, try lsof for a current snapshot and strace for a continuous monitoring. You'll have to filter the output with grep.

All these tools check the process structure (i.e. the data structure which the OS uses to manage a process) and enumerate the handles/file descriptors mentioned there. This is not a function of the filesystem API but the process management API.

[EDIT] See the section "How does it work" on this page to get started to write your own tool on Windows.

Aaron Digulla 2009-09-17 15:41:44

I know about procmon but it only works for a specified time period, also, this is something I want to figure out how to implement with my own code.

Garen 2009-09-17 17:53:39

OpenedFilesView v1.45 comes with an explanation how to do it on Windows (see my edits)

Aaron Digulla 2009-09-18 07:32:41

The web page mentions that it uses the "NtQuerySystemInformation API" and has a kernel driver (NirSoftOpenedFilesDriver.sys) but has no other info I can see on how to do it programmatically. That's a good clue for Windows, but not enough to get me started as a developer that's never done Windows kernel drivers.

Garen 2009-09-25 00:17:52

Ask the author and reuse the existing driver?

Aaron Digulla 2009-09-25 07:09:38

Sent. Hopefully he replies. :)

Garen 2009-10-01 03:30:51

Answer 2

+5 A:

On Linux, I'd definitely use strace -- it's simple and powerful. E.g.:

$ strace -o/tmp/blah -f -eopen,read bash -c "cat ciao.txt"

runs the requested command (including the subprocesses it spawns, due to -f) and also leaves in /tmp/blah (120 lines in my case for this example) detailing all the open and read calls made by these processes, and their results.

You do need a little processing afterwards to extract just the set of files that were successfully read, as you require; for example, with Python, you could do:

import re

linere = re.compile(r'^(\d+)\s+(\w+)\(([^)]+)\)\s+\=\s*(.*)$')

def main():
  openfiles = dict()
  filesread = set()
  with open('/tmp/blah') as f:
    for line in f:
      mo = linere.match(line)
      if mo is None:
        print "Unmatched line %r" % line
      pid, command, args, results = mo.groups()
      if command == 'open':
        fn = args.split(',', 1)[0].strip('"')
        fd = results.split(' ', 1)[0]
        openfiles[fd] = fn
      elif command == 'read':
        if results != '0':
          fd = args.split(',', 1)[0]
          filesread.add(openfiles[fd])
      else:
        print "Unknown command %r" % command
  print sorted(filesread)

This is a bit oversimplified (you need to watch some other syscalls such as dup &c) but, I hope, shows the gist of the work needed. In my example, this emits:

['/lib/libc.so.6', '/lib/libdl.so.2', '/lib/libncurses.so.5',
 '/proc/meminfo', '/proc/sys/kernel/ngroups_max',
 '/usr/share/locale/locale.alias', 'ciao.txt']

so it also counts as "reads" those that are done to get dynamic libraries &c, not just "data files"... at syscall level, there's little difference. I imagine you could filter non-data files out, if that's what you need.

I find strace so handy for such purposes that, were I tasked to do the same job on Windows, my first try would be to go for StraceNT -- not 100% compatible, and of course the underlying syscall names &c differ, but I think I could account for these differences in my Python code (preparing and executing the strace command, and post-processing the results).

Unfortunately, some other Unix systems, to my knowledge, only offer this kind of facilities if you're root (super-user) -- e.g. on Mac OS X you need to go via sudo in order to execute such tracing utilities as dtrace and dtruss; I don't know of a straightforward port of strace to the Mac, nor other ways to perform such tasks without root privileges.

Alex Martelli 2009-09-27 18:47:06

ansaurus

tags:

views:

answers:

Best way to watch process (and sub-processes) for file system read() I/O?

related questions