views:

761

answers:

2

I have several nodes acting as both servers and clients using Java's TCP sockets, i.e., Socket and ServerSocket. Each node uses persistent connections to communicate with 4 to 10 neighbors. However, sometimes a node (node1) may throw the following exception when trying to connect to another node (node2):

java.net.ConnectException: Connection refused

If I run netstat on node2, it shows that a TCP connection has been established with node1 on the appropriate port (61685, in this case).

tcp 0 0 (node2):61685 (node1):55150 ESTABLISHED

However, node1 throws the same exception every time it tries to connect.

The ServerSocket is created as follows:

void OpenRcvSocket(final int port) {
    Thread rcvthread = new Thread () {
            @Override
        public void run () {
            ServerSocket rcvlistener = null;
            boolean running = true;

            try {
                rcvlistener = new ServerSocket(port);
                while(running) {
                    Socket incoming = rcvlistener.accept();
                    new ConnectionHandler(incoming);
                }
            } catch (IOException ex) {
                System.out.println(ex);
            }

            finally {
                try {
                    rcvlistener.close();
                } catch (IOException ex) {
                    System.out.println(ex);
                }
            }
        }
    };
    rcvthread.start();

}

The sending portion looks like this:

synchronized void SendMsg(String dest, Message myMsg) {
    PrintWriter printwr = SendingConnectionList.get(dest);
    try {
        if(printwr == null) {
            Socket sendsock = new Socket(dest, port);
            printwr = new PrintWriter(sendsock.getOutputStream(), true);
            SendingConnectionList.put(dest, printwr);
        }
        printwr.print(myMsg.MsgToString());
        printwr.flush();
    } catch (UnknownHostException ex) {
        System.out.println(dest+": "+ex);
    } catch (IOException ex) {
        System.out.println(dest+": "+ex);
    }
}

The strange thing is that these nodes usually don't refuse all incoming connections because out of 10 neighbors, 6 might actually be able to connect, whereas 4 are rejected. I doubt there are firewalls running on all the of the nodes that I've tried and am pretty sure the service is running on the port. Is there any other reason this exception would be thrown? Thanks!

A: 

You may want to check your java security policy file.

Default Policy File

There may be something in there preventing your JVM from using the ports/address.

Tazzy531
The thing is, at times, not all connections aren't refused. Some nodes manage to connect whilst others are turned down. If no nodes could connect to that node, the problem could be related to the node blocking connections itself. Could there be any other reason for this exception to be thrown?
thodinc
A: 

Under Windows you may have to tell Windows Firewall to allow Java to do network activity. This might have been accidentially denied.

Thorbjørn Ravn Andersen
All of the nodes are running a Fedora 8.0 core, so unless the owners of the nodes put up a firewall themselves, there shouldn't be any. I'm working with 400 nodes at the moment and the application manages to run on about 320 of them without any problems, but it's doubtful that 80 nodes have set up firewalls when they shouldn't be. I have sent them e-mails to find out if that's the case, but is there any other reason this could be happening, especially since netstat shows an ESTABLISHED status?
thodinc
In that case I would investigate closely if the process might not be listening on ALL network interfaces but only one.
Thorbjørn Ravn Andersen
Thanks for your reply.The nodes are supposed to have identical hardware configurations and the few I checked only have one Ethernet interface, loopback and a tap interface.
thodinc
You should check that the process listens to ALL of them. This is important. Otherwise you can have a process that only responds to e.g. localhost:8080 but not externalipname.com:8080.
Thorbjørn Ravn Andersen
As far as I know, when I declare a new ServerSocket with only a port specified, it listens to all interfaces. Additionally, runningnetstat -an | grep LISTEN | grep tcpshows me the same:tcp 0 0 0.0.0.0:61694 0.0.0.0:* LISTENI still haven't been able to deduce why some connections are refused whereas others aren't. It can't be a firewall that's the problem in this case nor would any connections be accepted if a service wasn't running on the port. Could it be possible that my implementation of Sockets and ServerSockets isn't allowing for too many connections? Thanks.
thodinc
Ok, another problem that can arise is if you are too slow to accept() the incoming connections. The operating system maintains a queue of incoming connections not yet accepted, and if _THAT_ queue fills up, you get Connection refused. I am not intimately familiar with socket programming, so I cannot tell if you set up a new listener every time in your snippet?
Thorbjørn Ravn Andersen
while(running) { Socket incoming = rcvlistener.accept(); new ConnectionHandler(incoming); }That's certainly possible, but the ConnectionHandler class is starting a new thread to handle the incoming connection and then the while loop goes back to waiting for a new connection. I could try manually increasing the backlog of the ServerSocket to see if that makes a difference.
thodinc
I used a backlog size of 10 and it doesn't really seem to make any difference. I'm still getting a lot of "Connection refused" messages, unfortunately.
thodinc
Please elaborate on "backlog"? The queue I talk about is in the underlying operating system.
Thorbjørn Ravn Andersen
When you set up a ServerSocket in Java, you can specify the maximum queue length for incoming connection requests. Any more incoming connections are rejected.However, setting a backlog of even 30 didn't help. Many nodes are still partially refusing connections.
thodinc
If I print out the stack trace, it shows:java.net.ConnectException: Connection refused at gnu.java.net.PlainSocketImpl.connect(libgcj.so.8rh) at java.net.Socket.connect(libgcj.so.8rh) at java.net.Socket.connect(libgcj.so.8rh) at java.net.Socket.<init>(libgcj.so.8rh) at java.net.Socket.<init>(libgcj.so.8rh)
thodinc
Ah, have you remembered to install and select a Sun Java implementation?
Thorbjørn Ravn Andersen
Actually, I'm just using GCJ which I installed using "yum install" on Fedora. Does that make any difference when working with ServerSockets?
thodinc
It might. I would certainly try this first.
Thorbjørn Ravn Andersen
Just an update: I've discovered that the "Connection Refused" exception occurs because the thread seems to freeze on some nodes for some time and then resumes execution exactly where it left off. I'm not sure if this is due to the program or overloaded nodes, but I'm not sure if there's any reason besides Thread.sleep that a thread would get stuck due to programming.
thodinc
Freeze? Sounds like postponed garbage collection on a large heap. Can you attach jvisualvm to see what actually goes on? If not, then enable seeing garbage collection debug output.
Thorbjørn Ravn Andersen