views:

57

answers:

1

Both "netstat -p" and "lsof -n -i -P" seems to readlinking all processes fd's, like stat /proc/*/fd/*.

How to do it more efficiently?

My program wants to know what process is connecting to it. Traversing all processes again and again seems too ineffective.

Ways suggesting iptables things or kernel patches are welcome too.

+3  A: 

Take a look at this answer, where various methods and programs that perform socket to process mappings are mentioned. You might also try several additional techniques to improve performance:

  1. Caching the file descriptors in /proc, and the information in /proc/net. This is done by the programs mentioned in the linked answer, but is only viable if your process lasts more than a few seconds.
  2. You might try getpeername(), but this relies you knowing of the possible endpoints and what processes they map to. Your questions suggests that you are connecting sockets locally, you might try using Unix sockets which allow you to receive the credentials of a peer when exchanging messages by passing SO_PASSCRED to setsockopt(). Take a look at these examples (they're pretty nasty but the best I could find).
  3. Take a look at fs/proc/base.c in the Linux kernel. This is the heart of the information given by the result of a readlink on a file descriptor in /proc/PID/fd/FD. A significant part of the overhead is the passing of the requests up and down the VFS layer, the numerous locking that occurs on all the kernel data structures that provide the information given, and the stringyfying and destringyfying at the kernel and your end respectively. You might adapt some of the code in this file to generate this information without many of the intermediate layers, in particular minimizing the locking to once per process, or simply once per scan of the entire data set you're after.

My personal recommendation is to just brute force it for now, ideally traverse the processes in /proc in reverse numerical order, as the more recent and interesting processes will have higher PIDs, and return as soon as you've located the results you're after. Doing this once per incoming connection is relatively cheap, it really depends on how performance critical your application is. You'll definitely find it worthwhile to bypass calling netstat and directly parse the new connection from /proc/net/PROTO, then locate the socket in /proc/PID/fd. If all your traffic is localhost, just switch to Unix sockets and get the credentials directly. Writing a new syscall or proc module that dumps huge amounts of data regarding file descriptors I'd save for last.

Matt Joiner
2 is not a way. Unix sockets either. The program catches connection redirected by "-j REDIRECT" and to show to user what program the connection is for (and apply policies depending of process name) E.g. if firefox then high prio; if qbittorrent then low prio.
Vi
"Doing this once per incoming connection is relatively cheap" even if it is, the look of "strace" in console will not be that nice than before. Majority of syscalls would be in vain (only a few - with a profit).
Vi
@Vi: Look, and actuality are too different things. You might notice that calling any C program "looks" very bad in strace. Since you haven't actually implemented this yet, you'll observe how slow netstat appears to be. You'll find this has nothing to do with the overhead of the fd lookup in /proc, and everything to do with the reverse host name lookups (pass `-n` to bypass this). You should profile your eventual app before you this blame this part of your program.
Matt Joiner