views:

322

answers:

6

Hi all,

A situation I have under Windows XP (SP3) has been driving me nuts, and I'm reaching the end of my tether, so maybe someone can provide some inspiration.

I have a C++ networking program (non-GUI). This program is built to compile and run under Windows, MacOS/X, and Linux, so it uses select() and non-blocking I/O as the basis for its event loop.

In addition to its networking duties, this program needs to read text commands from stdin, and exit gracefully when stdin is closed. Under Linux and MacOS/X, that's easy enough -- I just include STDIN_FILENO in my read fd_set to select(), and select() returns when stdin is closed. I check to see that FD_ISSET(STDIN_FILENO, &readSet) is true, try to read some data from stdin, recv() returns 0/EOF, and so I exit the process.

Under Windows, on the other hand, you can't select on STDIN_FILE_HANDLE, because it's not a real socket. You can't do non-blocking reads on STDIN_FILE_HANDLE, either. That means there is no way to read stdin from the main thread, since ReadFile() might block indefinitely, causing the main thread to stop serving its network function.

No problem, says I, I'll just spawn a thread to handle stdin for me. This thread will run in an infinite loop, blocking in ReadFile(stdinHandle), and whenever ReadFile() returns data, the stdin-thread will write that data to a TCP socket. That socket's connection's other end will be select()'d on by the main thread, so the main thread will see the stdin data coming in over the connection, and handle "stdin" the same way it would under any other OS. And if ReadFile() returns false to indicate that stdin has closed, the stdin-thread just closes its end of the socket-pair so that the main thread will be notified via select(), as described above.

Of course, Windows doesn't have a nice socketpair() function, so I had to roll my own using listen(), connect(), and accept() (as seen in the CreateConnectedSocketPair() function here. But I did that, and it seems to work, in general.

The problem is that it doesn't work 100%. In particular, if stdin is closed within a few hundred milliseconds of when the program starts up, about half the time the main thread doesn't get any notification that the stdin-end of the socket-pair has been closed. What I mean by that is, I can see (by my printf()-debugging) that the stdin-thread has called closesocket() on its socket, and I can see that the main thread is select()-ing on the associated socket (i.e. the other end of the socket-pair), but select() never returns as it should... and if it does return, due to some other socket selecting ready-for-whatever, FD_ISSET(main_thread_socket_for_socket_pair, &readSet) returns 0, as if the connection wasn't closed.

At this point, the only hypothesis I have is that there is a bug in Windows' select() implementation that causes the main thread's select() not to notice that the other end of the socket-pair has closed by the stdin-thread. Is there another explanation? (Note that this problem has been reported under Windows 7 as well, although I haven't looked at it personally on that platform)

A: 

Is it possible you have a race condition? Eg. Do you ensure that the CreateConnectedSocketPair() function has definitely returned before the stdin-thread has a chance to try closing its socket?

caf
That's a very good guess, but I double checked, and the stdin-thread isn't launched until after CreateConnectedSocketPair() has successfully returned. So I don't think that is the problem.
Jeremy Friesner
A: 

I am studying in your code. In the CreateConnectedSocketPair(), socket1 is used for listen(), and newfd is used for send/recv data. So, why does "socket1 = newfd"? How to close the listenfd then?

Ren Yufei
The sockets are reference counted (note that their type is ConstSocketRef, not int). So listenfd will be closed when its reference count falls to zero (i.e. by the assignment operator in the line you quoted)
Jeremy Friesner
A: 

Not a solution, but as a workaround, couldn't you send some magic "stdin has closed" message across the TCP socket and have your receiving end disconnect its socket when it sees that and run whatever 'stdin has closed' handler?

bdk
That would work for most cases, but if I want the connection to handle arbitrary data, then any "magic message" sequence of bytes might accidentally be sent by coincidence as part of the regular data being sent over stdin. I'd prefer to avoid that possibility if possible.
Jeremy Friesner
A: 

Honestly your code is too long and I don't have time right now to spend on it.

Most likely the problem is in some cases closing the socket doesn't cause a graceful (FIN) shutdown.

Checking for exceptions returning from your select may catch the remainder of cases. There is also the (slim) possibility that no notification is actually being sent to the socket that the other end has closed. In that case, there is no way other than timeouts or "keep alive"/ping messages between the endpoints to know that the socket has closed.

If you want to figure out exactly what is happening, break out wireshark and look for FINs and RSTs (and the absence of anything). If you see the proper FIN sequence going across when your socket is closed, then the problem must be in your code. if you see RST, it may be caught by exceptions, and if you don't see anything you'll need to devise a way in your protocol to 'ping' each side of the connection to make sure they are still alive, or set a sufficiently short timeout for more data.

SoapBox
"Honestly your code is too long and I don't have time right now to spend on it." -- This is unfair IMHO, introducing a potentially helpful answer. There isn't any code in the question (perhaps a problem in itself) but the question really does need that level of detail. Why not just say that you have a suggestion, hunch, hypothesis?
JXG
He links to a .cpp file which is over 1400 lines. I'm sorry, but analyzing that is out of the question. I did briefly look at it, and I couldn't even find the area in question (the only selects are for setting up an initial connection) leading me to believe it is the wrong file.
SoapBox
The link was relevant to the CreateConnectedSocketPair() function I mentioned, not to the event loop (which as you say, is in another file)
Jeremy Friesner
A: 

Rather than chasing perceived bugs in select(), I'm going to address your original fallacy that drove you away from simple, reliable, single-threaded design.

You said "You can't do non-blocking reads on STDIN_FILE_HANDLE, either. That means there is no way to read stdin from the main thread, since ReadFile() might block indefinitely" but this simply isn't the whole story. Look at ReadConsoleInput, WSAEventSelect, and WaitForMultipleObjects. The stdin handle will be signalled only when there is input and ReadConsoleInput will return immediately (pretty much the same idea behind select() in Unix).

Or, use ReadFileEx and WaitForMultipleObjectsEx to have the console reads fire off an APC (which isn't all that asynchronous, it runs on the main thread and only during WaitForMultipleObjectsEx or another explicit wait function).

If you want to stick with using a second thread to get async I/O on stdin, then you might try closing the handle being passed to select instead of doing a socket shutdown (via closesocket on the other end). In my experience select() tends to return really quickly when one of the fds it is waiting on gets closed.

Or, maybe your problem is the other way around. The select docs say "For connection-oriented sockets, readability can also indicate that a request to close the socket has been received from the peer". Typically you'd send that "request to close the socket" by calling shutdown(), not closesocket().

Ben Voigt
Well, I did spend a number of hours trying to get ReadFileEx() to do non-blocking reads on stdin... no dice. ReadFileEx() will still block on stdin, even if I set up all the overlapped-I/O stuff that is supposed to enable non-blocking behavior. I haven't tried ReadConsoleInput(), so perhaps that would work.In any case, the goal of the API is to allow the main thread to use the same select()-based event loop and work the same under all OS's... so forcing the event loop to use WaitForMultipleObjectsEx() under Windows isn't desirable.
Jeremy Friesner
Well, you simply can't open CONIN$ or any console for overlapped I/O, the CreateFile docs say all flags are ignored when accessing a console. Anyway, select really is a horrible API. On unix you usually want to use poll, which has semantics almost identical to WaitForMultipleObjectsEx. Plus, I've just today finished writing cross-platform code for this and it isn't too bad (actually the select and poll versions have been working for years, but they can only be interrupted on *nix, not Windows, so I just added the WaitForMultipleObjectsEx variant).
Ben Voigt
+1  A: 

Just for the record, this problem turned out to be a different issue entirely, unrelated to threading, Windows, or stdin. The actual problem was an inter-process deadlock, where the parent process was blocked, waiting for the child processes to quit, but sometimes the child processes would be simultaneously blocked, waiting on the parent to supply them with some data, and so nothing would move forward.

Apologies to all for wasting your time on a red herring; if there's a standard way to close this case as unwarranted, let me know and I'll do it.

-Jeremy

Jeremy Friesner