views:

304

answers:

2

Hi. Am a c-coder for a while now - neither a newbie nor an expert. Now, I have a certain daemoned application in C on a PPC Linux. I use PHP's socket_connect as a client to connect to this service locally. The server uses epoll for multiplexing connections via a Unix socket. A user submitted string is parsed for certain characters/words using strstr() and if found, spawns 4 joinable threads to different websites simultaneously. I use socket, connect, write and read, to interact with the said webservers via TCP on their port 80 in each thread. All connections and writes seems successful. Reads to the webserver sockets fail however, with either (A) all 3 threads seem to hang, and only one thread returns -1 and errno is set to 104. The responding thread takes like 10 minutes - an eternity long:-(. *I read somewhere that the 104 (is EINTR?), which in the network context suggests that ...'the connection was reset by peer'; or (B) 0 bytes from 3 threads, and only 1 of the 4 threads actually returns some data. Isn't the socket read/write thread-safe? I use thread-safe (and reentrant) libc functions such as strtok_r, gethostbyname_r, etc.

*I doubt that the said webhosts are actually resetting the connection, because when I run a single-threaded standalone (everything else equal) all things works perfectly right, but of course in series not parallel.

There's a second problem too (oops), I can't write back to the client who connect to my epoll-ed Unix socket. My daemon application will hang and hog CPU > 100% for ever. Yet nothing is written to the clients end. Am sure the client (a very typical PHP socket application) hasn't closed the connection whenever this is happening - no error(s) detected either. Any ideas?

I cannot figure-out whatever is wrong even with Valgrind, GDB or much logging. Kindly help where you can.

A: 

Yes, read/write are thread-safe. But beware of gethostbyname() and getservbyname() if you're using them - they return pointers to static data, and may not be thread-safe.

errno 104 is ECONNREFUSED (not EINTR). Use strerror or perror to get the textual error message (like 'Connection reset by peer') for a particular errno code.

The best way to figure out what's going wrong is often to do very detailed logging - log the results of every operation, plus details like the IP address/port connecting to, the number of bytes read/written, the thread id, and so forth. And, of course, make sure your logging code is thread-safe :-)

David Gelhar
Thanks for helpful suggestions. I confirm (as I mentioned previously) that I use the thread-safe and reentrant versions gethostbyname_r, strtok_r, etc.Also I have done every possible logging to actually nail down exactly where I see the code going dead - at read! Unfortunately I cannot ask 'read' to log out it's internals, I can only wait to receive its output and error codes.
A: 

Getting an ECONNRESET after 10 minutes sounds like the result of your connection timing out. Either the web server isn't sending the data or your app isn't receiving it.

To test the former, hookup a program like Wireshark to the local loopback device and look for traffic to and from the port you are using.

For the later, take a look at the epoll() man page. They mention a scenario where using edge triggered events could result in a lockup, because there is still data in the buffer, but no new data comes in so no new event is triggered.

Plasmatic