views:

3542

answers:

2

We have a process hanging, and the truss shows it also tries to connect but failed with error ECONNREFUSED. The man page says the following, but why it got rejected again and again?

 ECONNREFUSED            The attempt to  connect  was  force-
                         fully  rejected. The calling program
                         should close(2) the socket  descrip-
                         tor,      and      issue     another
                         socket(3SOCKET) call to obtain a new
                         descriptor before attempting another
                         connect() call.

truss -p 2145

/3: lwp_park(0x00000000, 0) (sleeping...)

/2: nanosleep(0xFFFFFFFF7B5FBE60, 0xFFFFFFFF7B5FBE50) (sleeping...)

/2: nanosleep(0xFFFFFFFF7B5FBE60, 0xFFFFFFFF7B5FBE50) = 0

/2: so_socket(PF_INET, SOCK_STREAM, IPPROTO_TCP, "", SOV_DEFAULT) = 17

/2: fcntl(17, F_SETFD, 0x00000001) = 0

/2: connect(17, 0xFFFFFFFF7B5FBF40, 16, SOV_DEFAULT) Err#146 ECONNREFUSED

/2: close(17) = 0

/2: nanosleep(0xFFFFFFFF7B5FBE60, 0xFFFFFFFF7B5FBE50) (sleeping...)

/2: nanosleep(0xFFFFFFFF7B5FBE60, 0xFFFFFFFF7B5FBE50) = 0

/2: so_socket(PF_INET, SOCK_STREAM, IPPROTO_TCP, "", SOV_DEFAULT) = 17

/2: fcntl(17, F_SETFD, 0x00000001) = 0

/2: connect(17, 0xFFFFFFFF7B5FBF40, 16, SOV_DEFAULT) Err#146 ECONNREFUSED

/2: close(17) = 0

/2: nanosleep(0xFFFFFFFF7B5FBE60, 0xFFFFFFFF7B5FBE50) (sleeping...)

A: 

Firewall perhaps? There are lots of potential reasons.

Simon
+2  A: 

Does it sometimes work from this machine and then start failing, or is the error returned every time? Does it work from some machines and not others?

The server program may have crashed or closed the listening socket. Try "netstat -af inet" on the server to ensure that there is a socket in LISTEN state on that port, and to check the current number of connections on that port. The Solaris command "pfiles pid" on the server process id can also be used to verify that the server still has the listening socket open, and to check the current number of client connections. If many connections are being made, ensure that the listen() backlog is sufficient. Add the -vall option to your truss command on the client to show the address and port where you are connecting, to ensure they are correct. Also try making the same connection from the server machine to rule out any network, firewall, or NAT issue.

mark4o
Actually this process is an oracle listener which is a server instead of a client. So it's weird as this error only happens on client process, right?
Daniel
ECONNREFUSED is an error returned from connect(), so it can only occur in a client (if a client is defined as the party that initiates the connection).
mark4o
Thanks Mark. I understand more now. A sever process oracle listener was hanging, so we used lsnrctl stop to stop the listener, but "lnsrctl stop" hangs also, the truss result of "lsnrctl stop" reported the error ECONNREFUSED. So in this case, "lsnrctl stop" should be a client and the oracle listener is a server. Although I don't know the result why the error happen, but I know more. Thanks again
Daniel