tags:

views:

629

answers:

3

We have a long standing bug in our production code. This is essentially a socket based daemon. It listens to a bunch of filedescriptors using select.

Occasionally (once a day or so), select will return with EBADF.

I have written code to search for the bad filedescriptor, that loops over each fd and calls select on it. These calls never return EBADF. I also tried fstat. They also never return EBADF.

I also rewrote the daemon to use poll. This did not help.

Does anyone have some other ideas ? (apart from i made a dumb mistake, which is all to easy to do with select).

+3  A: 

Most likely the select is called on a closed file descriptor. The usual source of that is reusing the fd_set without re-initializing it. Do you have anything going on in the signal handlers? (like re-opening a log file on a HUP?)

Nikolai N Fetissov
+2  A: 

If you use poll() then you can go through the data and look for which fd is failing, which is the big advantage.

James Antill
+1  A: 

I agree with James. With poll(), you have revents per fd which can easily be checked.

I.e.

struct pollfd fds[NUM_FDS];
int ret, i;

...

ret = poll(fds, NUM_FDS, POLL_TIMEOUT);
for (i = 0; i < NUM_FDS; i++)
  if (fds[i].revents & POLLHUP || fds[i].revents & POLLNVAL)
     ... do something ...

Of course you would not implement it that way in the real world, its just an example. I stopped using select() a long time ago, poll() is a much better interface. You're correct, its just too easy to shoot yourself in the foot with select().

Tim Post
I think in this particular situation it's also worth checking for POLLNVAL.
Nikolai N Fetissov
@Nikolai , yup. Example updated.
Tim Post