views:

597

answers:

3

I have a bunch of flows and data processing applications that I occasionally need to spy on, meaning I need to know what files they read. This is mostly to aid in packaging testcases, but can also be useful when debugging.

Is there a way to run the executables in such a way that produces such a list?

I have two thoughts on this:

  1. There is a command that I can invoke and that command invokes my apps. Something along the lines of GDB. I call GDB, give it a path to the executable and some arguments and GDB calls it for me. Perhaps there's something similar to telling me how system resources are used.
  2. Maybe the more interesting (but unnecessary side path) solution.
    1. create library called libc.so which implements fopen (and some others)
    2. change LD_LIBRARY_PATH to point at the new library
    3. make a copy of the real libc.so and rename fopen (nepof, perhaps) in an editor
    4. my library loads the copy and calls the renamed function as necessary to provide fopen functionality.
    5. call the app which then calls my proxy fopen.

Alternative #1 would certainly be the preferable one but comments on how to do #2 more easily are welcome too.

+4  A: 
man strace

example (assume 2343 is the process id):

# logging part
strace -p 2343 -ff -o strace_log.txt 

# displaying part
grep ^open strace_log.txt
Toader Mihai Claudiu
+6  A: 

One option is to use strace:

strace -o logfile -eopen yourapp

This will log all file-open events, but it will impose a performance penalty that may be significant. It has the advantage of being easy to use however.

Another option is to use LD_PRELOAD. This corresponds to your option #2. The basic idea is to do something like this:

#define _GNU_SOURCE
#include <stdio.h>
#include <dlfcn.h>

int open(const char *fn, int flags) {
    static int (*real_open)(const char *fn, int flags);

    if (!real_open) {
     real_open = dlsym(RTLD_NEXT, "open");
    }

    fprintf(stderr, "opened file '%s'\n", fn);
    return real_open(fn, flags);
}

Then build with:

gcc -fPIC -shared -ldl -o preload-example.so preload-example.c

And run your program with eg:

$ LD_PRELOAD=$PWD/preload-example.so cat /dev/null
opened file '/dev/null'

This has much less overhead.

Note, however, that there are other entry points for opening files - eg, fopen(), openat(), or one of the many legacy compatibility entry points:

00000000000747d0 g    DF .text      000000000000071c  GLIBC_2.2.5 _IO_file_fopen
0000000000068850 g    DF .text      000000000000000a  GLIBC_2.2.5 fopen
000000000006fe60 g    DF .text      00000000000000e2  GLIBC_2.4   open_wmemstream
00000000001209c0  w   DF .text      00000000000000ec  GLIBC_2.2.5 posix_openpt
0000000000069e50 g    DF .text      00000000000003fb  GLIBC_2.2.5 _IO_proc_open
00000000000dcf70 g    DF .text      0000000000000021  GLIBC_2.7   __open64_2
0000000000068a10 g    DF .text      00000000000000f5  GLIBC_2.2.5 fopencookie
000000000006a250 g    DF .text      000000000000009b  GLIBC_2.2.5 popen
00000000000d7b10  w   DF .text      0000000000000080  GLIBC_2.2.5 __open64
0000000000068850 g    DF .text      000000000000000a  GLIBC_2.2.5 _IO_fopen
00000000000d7e70  w   DF .text      0000000000000020  GLIBC_2.7   __openat64_2
00000000000e1ef0 g    DF .text      000000000000005b  GLIBC_2.2.5 openlog
00000000000d7b10  w   DF .text      0000000000000080  GLIBC_2.2.5 open64
0000000000370c10 g    DO .bss       0000000000000008  GLIBC_PRIVATE _dl_open_hook
0000000000031680 g    DF .text      0000000000000240  GLIBC_2.2.5 catopen
000000000006a250 g    DF .text      000000000000009b  GLIBC_2.2.5 _IO_popen
0000000000071af0 g    DF .text      000000000000026a  GLIBC_2.2.5 freopen64
00000000000723a0 g    DF .text      0000000000000183  GLIBC_2.2.5 fmemopen
00000000000a44f0  w   DF .text      0000000000000088  GLIBC_2.4   fdopendir
00000000000d7e70 g    DF .text      0000000000000020  GLIBC_2.7   __openat_2
00000000000a3d00  w   DF .text      0000000000000095  GLIBC_2.2.5 opendir
00000000000dcf40 g    DF .text      0000000000000021  GLIBC_2.7   __open_2
00000000000d7b10  w   DF .text      0000000000000080  GLIBC_2.2.5 __open
0000000000074370 g    DF .text      00000000000000d7  GLIBC_2.2.5 _IO_file_open
0000000000070b40 g    DF .text      00000000000000d2  GLIBC_2.2.5 open_memstream
0000000000070450 g    DF .text      0000000000000272  GLIBC_2.2.5 freopen
00000000000318c0 g    DF .text      00000000000008c4  GLIBC_PRIVATE __open_catalog
00000000000d7b10  w   DF .text      0000000000000080  GLIBC_2.2.5 open
0000000000067e80 g    DF .text      0000000000000332  GLIBC_2.2.5 fdopen
000000000001e9b0 g    DF .text      00000000000003f5  GLIBC_2.2.5 iconv_open
00000000000daca0 g    DF .text      000000000000067b  GLIBC_2.2.5 fts_open
00000000000d7d60  w   DF .text      0000000000000109  GLIBC_2.4   openat
0000000000068850  w   DF .text      000000000000000a  GLIBC_2.2.5 fopen64
00000000000d7d60  w   DF .text      0000000000000109  GLIBC_2.4   openat64
00000000000d6490 g    DF .text      00000000000000b6  GLIBC_2.2.5 posix_spawn_file_actions_addopen
0000000000121b80 g    DF .text      000000000000008a  GLIBC_PRIVATE __libc_dlopen_mode
0000000000067e80 g    DF .text      0000000000000332  GLIBC_2.2.5 _IO_fdopen

You may need to hook all of these for completeness - at the very least, the ones not prefixed with _ should be hooked. In particular, be sure to hook fopen seperately, as the libc-internal call from fopen() to open() is not hooked by a LD_PRELOAD library.

A similar caveat applies to strace - there is the 'openat' syscall as well, and depending on your architecture there may be other legacy syscalls as well. But not as many as with LD_PRELOAD hooks, so if you don't mind the performance hit, it may be an easier option.

bdonlan
A: 

What I use is something like:

strace -o file.txt ./command

You can then

cat file.txt | grep open

to get a list of all the files that the program opened.

Arcterex