ansaurus

Question

Socket Bind Error

Answer 1

+11 A:

I think you may be going too fast.

Most operating systems have a limit on the number of sockets they can have open at any one time but it's actually worse than that.

When a socket is closed down, it is put in a special time-wait state for a certain amount of time. This is usually twice the packet time-to-live value and it ensures that there aren't still packets out in the network that are on the way to your socket.

Once that time expires, you can be sure that all packets out in the network have already died. The socket is placed in that special state so that packets that were out in the network when you shut it down can be captured and thrown away if they arrive before they die.

I think that's what's happening in your case, the sockets aren't being freed as quickly as you think.

We had a similar problem with code that opened lots of short-lived sessions. It ran fine for a while but then the hardware got faster, allowing many more to be opened in a given time period. This manifested itself as inability to open more sessions.

One way to check this is to do netstat -a from the command line and see how many sessions are actually in the wait state.

If that does turn out to be the case, there's a few ways to handle it.

re-use your sessions, either manually or by maintaining a connection pool.
introduce a delay in each connection to try and stop reaching the saturation point.
go flat out until you reach saturation and then modify your behaviour, such as running your connect logic inside a while statement that retries for up to 60 times with a two-second delay each time before giving up totally. This lets you run at full speed, slowing down only if there's a problem.

That last bullet point deserves some expansion. We actually used a back-off strategy in our afore-mentioned application which would gradually lessen the load on a resource provider if it was complaining so, instead of 30 two-second delays, we opted for a one-second delay, then two seconds, then four and so on.

The general process for a back-off strategy is as follows and it can be used in any case where there may be temporary shortages of a resource. The action alluded to in the pseudo-code below would be the opening of a socket in your case.

set maxdelay to 16 # maximum time period between attempts
set maxtries to 10 # maximum attempts

set delay to 0
set tries to 0
while more actions needed:
    if delay is not 0:
        sleep delay
    attempt action
    if action failed:
        add 1 to tries
        if tries is greater than maxtries:
           exit with permanent error
        if delay is 0:
            set delay to 1
        else:
            double delay
            if delay is greater than maxdelay:
                set delay to maxdelay
    else:
        set delay to 0
        set tries to 0

This allows the process to run at full speed in the vast majority of cases but backs off when errors start occurring, hopefully giving the resource provider time to recover. The gradual increase in delays allows for more serious resource restrictions to recover and the maximum tries catches what you would term permanent errors (or errors that are taking too long to recover).

paxdiablo 2009-07-06 06:34:58

you'll have to configure the socket time_wait and other related paramters depending on the OS (of the machine you are connecting to)

Ryan Fernandes 2009-07-06 06:37:50

It's not always a good idea to fiddle with these parameters. Most should be tuned for the network characteistics and then your app should be tuned for that. Reducing time_wait without reducing time-to-live will result in spurious packets arriving. Reducing TTL to the point where the packets can't reach the destination means lots of dropped packets. Ideally, you should either keep connections open (manually or with a connection pool) or tune your apps behaviour (such as with a delay mentioned by @Stu in his answer).

paxdiablo 2009-07-06 06:40:44

Answer 2

+2 A:

My suggestions:

flush the socket after the write
add a tiny sleep (~50ms?) at the end of the above method

@Pax has a good point about the state of the socket afterwards. Try your code, let it fail, and then do a netstat and analyze it (or post here)

Stu Thompson 2009-07-06 06:35:56

Actually I had a sleep time , but it was of 5ms only.

rantravee 2009-07-06 06:48:31

Answer 3

+1 A:

I agree with others that you're running out of socket endpoints. However, that's not 100% crystal clear from you example as presumably the exception is coming from a connect() or bind() call that may be underlying some other high-level Java method.

One should also underscore that running out of endpoints isn't some kind of bizzare restriction of the socket library but a pretty fundamental part of any TCP/IP implementation. You need to keep information about old connections around for a little while so that late arriving IP packets for an old connection are dropped.

setReuseAddress() corresponds to the low-level SO_REUSEADDR socket option and only applies to the server when it does a listen().

George Phillips 2009-07-06 07:23:27

Aditya Sehgal 2009-07-06 08:29:32

Answer 4

A:

What operating system? If you're using windows, and I'm guessing you are, then there's a limit to the number of client connections that you can have (this is configured by the MaxUserPort registry entry which just so happens to be 4000 by default; see http://technet.microsoft.com/en-us/library/aa995661.aspx and http://www.tech-archive.net/Archive/Windows/microsoft.public.windows.server.general/2008-09/msg00611.html for details of changing it). That, coupled with the fact that you're initiating the socket close from your client and so accumulating sockets in TIME_WAIT state on your client is likely the cause of your problem.

Please note that the solution to the TIME_WAIT accumulation issue is NOT to fiddle with the TCP stack's parameters to make the problem go away. TIME_WAIT exists for a very good reason and removing or shortening it will likely cause you new problems!

So, assuming you're on a Windows machine, step one is to tune your MaxUserPort value so that you have more dynamic ports available for your outbound connections. Next, if this doesn't fix things for you, you can think about which side of the connection should end up with the TIME_WAIT (assuming you can control the protocol used on your connections...) The peer that issues the 'active close' is the one that ends up with the TIME_WAIT so if you can change things so that your servers issue the active close then the TIME_WAIT sockets will accumulate on the server rather than on the client and this MAY be better for you...

Len Holgate 2009-07-06 15:10:30

Indeed I 'm using windows Xp. I don't know what's the limit for the incoming connections, but if it is the same ,as the outgoing connections, then I'm just relocating the problem to the server. Still ,I think that maximazing the outgoing ports is a good approach. Thanks !

rantravee 2009-07-06 16:26:46

There is no corresponding limit on inbound connections.

Len Holgate 2009-07-06 17:52:24

Answer 5

A:

I think this is the same as this question (and I've linked to my answer, which I think might possibly help.)

http://stackoverflow.com/questions/846861/java-bind-exception/899394#899394

rascher 2009-07-08 03:50:45

Answer 6

+1 A:

If the sample code is actually how you are executing the loop, you may have things in the wrong order.

The java docs for setReuseAddress say: The behaviour when SO_REUSEADDR is enabled or disabled after a socket is bound (See isBound()) is not defined.

Try moving the call to somewhere before you bind() or connect().

Duck 2009-07-08 04:34:53

Answer 7

A:

sometime after using socket.close() will not close socket immediately and the loop executes (in loop it tries the socket connection witrh same ip and port ) much faster so please null the socket.

socket_server.close();

socket_server = null;

Thanks Sunil Kumar Sahoo

Deepak 2010-04-16 14:31:09

ansaurus

tags:

views:

answers:

Socket Bind Error

related questions