views:

210

answers:

4

server:
vxworks 6.3
calls the usual socket, bind, listen, then:

for (;;)
{
  client = accept(sfd,NULL,NULL);
  // pass client to worker thread
}

client:
.NET 2.0
TcpClient constructor to connect to server that takes the string hostname and int port, like:

TcpClient client = new TcpClient(server_ip, port);

This is working fine when the server is compiled and executed in windows (native c++).

intermittently, the constructor to TcpClient will return the instance, without throwing any exception, but the accept call in vxWorks does not return with the client fd. tcpstatShow indicates no accept occurred.

What could possibly make the TcpClient constructor (which calls 'Connect') return the instance, while the accept call on the server not return? It seems to be related to what the system is doing in the background - it seems more likely to get this symptom to occur when the server is busy persisting data to flash or an NFS share when the client attempts to connect, but can happen when it isn't also.

I've tried adjusting priority of the thread running accept
I've looked at the size of the queue in 'listen'. There's enough.
The total number of file descriptors available should be enough (haven't validated this yet though, first thing in the morning)

A: 

It could be many reasons, however we won't know unless we can get more information from the server and client side. Does it throw out any errors? A list of TCP/IP errors can be found here Windows Socket Error. On the server side, are you catching any exceptions? Maybe you can try closing the connection (with linger of 1 second) after it has an error?

jwee
That's exactly the problem - there is no error! The client side happily returns the TcpClient instance, but the server never returns from accept. On the server side, there will be no exceptions, as the socket libraries are C libraries, not C++. I'm getting to the point of tempting to write my own protocol to confirm the accept worked by immediately sending data back to the client and if I don't get it after some fixed amount of time, I try to Connect again...but what a kludge that would be!
paquetp
+1  A: 

would it be possible for you to post a wireshark/netmon of what is happening on the wire?

stuck
Seconded. This will isolate whether the problem is on the client or server side, halving the number of places to look. Should this have been a comment though, instead of an answer?
Slartibartfast
Thirdided. Wireshark would be very helpful feedback
Default
this looks like the way I'm going to go...didn't want to, was hoping someone would just mention something like - 'oh, TcpClient doesn't work with vxWorks, you have you get the Socket inside and set flag xyz or something'...oh well.
paquetp
A: 

Is it possible to bind the server on another port and see if it accepts there? If the client returns it sounds like it's getting an accept from something on your server. I do not know about vxworks but in Windows you should always try to not bind to anything under 1000.

Default
A: 

Your server's accept() call looks wrong. The POSIX accept() call that I know has:

int accept(int sockfd, struct sockaddr *addr, socklen_t *addrlen); 

where *addr is a required pointer that gets written to if the call works—indeed, one of the failure states for the call is:

[EFAULT]    The address parameter is not in a writable part of the user address space.

I haven't done Windows socket programming, but I understand it's POSIX-compliant, and Beej's guide doesn't mention any exceptions for Windows for accept(), so this should still apply. Somewhat relevant, the Python accept() call also 'returns' the address field (I say somewhat since Python did its best to emulate the C networking API as it made sense.)

I would suggest checking errno and using perror after the accept call in the server, to see if [EFAULT] is set (it will also inform you if you ran out of descriptors, as errno gets set to [EMFILE] or [ENFILE])

If that doesn't prove to be the issue, use ncat, as either server or client, to investigate further. I'd run it with -vv since you want to know exactly when connections are made, what's sent etcetera.

alexandru
the sockets i'm using are not part of POSIX compliance (their not unix domain sockets), they're internet sockets, or 'BSD' sockets, which are not part of POSIX.the 2nd parameter of accept is optional as per http://opengroup.org/onlinepubs/007908799/xns/accept.html. You can pass NULL, if you don't care what the actual address is of the client. And if it were not allowed, it would still return with an error, which mine does not.
paquetp