views:

82

answers:

1

I have an application that runs on embedded linux (older kernel, 2.6.18). I'm using Live555. Occasionally when the camera is heavily loaded, my RTSP server (built using Live555) will hang indefinitely--no amount of connecting or cajoling seems to get it to snap out of it, short of resetting the application.

I narrowed the hang down to this code:

static int blockUntilReadable(UsageEnvironment& env,
                  int socket, struct timeval* timeout) {
  int result = -1;
  do {
    fd_set rd_set;
    FD_ZERO(&rd_set);
    if (socket < 0) break;
    FD_SET((unsigned) socket, &rd_set);
    const unsigned numFds = socket+1;

    result = select(numFds, &rd_set, NULL, NULL, timeout);  <--HANG

timeout is, of course, a NULL pointer which indicates it should block until one of the sockets is readable. Problem is: it doesn't matter if I connect to the RTSP server--it simply blocks indefinitely.

I did a netstat -an, and it always outputs something like:

Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State
tcp        0      0 0.0.0.0:5222            0.0.0.0:*               LISTEN
tcp        0      0 0.0.0.0:5800            0.0.0.0:*               LISTEN
tcp        0      0 0.0.0.0:5000            0.0.0.0:*               LISTEN
tcp        0      0 0.0.0.0:5802            0.0.0.0:*               LISTEN
tcp       21      0 0.0.0.0:554             0.0.0.0:*               LISTEN

When it's in a failed state, I always see 21 on the Recv-Q, which is "The count of bytes not copied by the user program connected to this socket."

Does anyone have any idea what might be going south, or how I could troubleshoot this issue?

+1  A: 

That code looks pretty solid. I'm a little curious as to why you're casting to unsigned int, but it shouldn't hurt anything.

Some thoughts:

It's not hanging where you think it is. Hopefully you've double/triple checked this. (Check it again?)

Your netstat interpretation is wrong. That part, as the man page notes, is for "Established" sockets - yours is a listener, which is the next sentence: "Listening: Since Kernel 2.6.18 this column contains the current syn backlog."

That looks like a huge backlog... Which leads me to think you're not accept()-ing, perhaps because you're stuck in select(). That is the select() on your listening socket right?

Last, double check that you're calling select() on the right socket. ie, print out that socket arg, and see if it is what it should be.

Essentially, verify: 1) it's is hanging in select() and 2) the arguments to select are correct. I suspect one of those two are not true.

Thanatos
“Is *socket* the listening/accepting socket?” was my first thought, too.
Chris Johnsen
Aye, my suspicion is that `socket` is ending up with an incorrect value here.
caf
Thanks Thanatos. FWIW, this isn't my code--it's part of the open source RTSP lib, Live555.I am positive it's hanging in select(). I am not positive the arguments to select are correct, so I'll validate that. And thanks for the heads up on the netstat changes.
kidjan