tags:

views:

676

answers:

7

I am testing on a local Linux server with both the server and client in the same server. After about 1024 connections, in my code, where I connect, I get connection refused. At first I thought it was the fd_set_max limit of 1024 for select and changed the server to do poll instead of select and I still don't get past this number. My ulimit -n is set to 2048 and I monitor the lsof on the server it reaches about 1033 (not sure if this is exact number) and fails. Any help is much appreciated.

+1  A: 

Maybe you reached your process limit for open file descriptors.

I'm not sure if I understand you correctly: Do you have both the server side and the client side in the same process? Then you will use twice as much file descriptors. That comes close to what you see with ulimit. If that is not the case could the problem be on the server side? Maybe the server process runs out of descriptors and can no longer accept any more connections.

The accept man page mentions that you should get a return value of:

EMFILE
The per-process limit of open file descriptors has been reached.

ENFILE
The system limit on the total number of open files has been reached.

What error code do you get? You can obviously only add connections that were successfully _accept_ed into select or poll.

I know you already know how to check ulimit, but others may not:

ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 40448
max locked memory       (kbytes, -l) 32
max memory size         (kbytes, -m) unlimited
open files                      (-n) 4096
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 40448
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
lothar
Thanks for your quick response, let me explain a little more in detail, so both server and client are two separate process in the machine. Server is more of a manager which keeps track of all new client process. So the client process registers itself with the server thats listening on a port. once ~1024 clients register, future clients get a connection refused. and i checked ulimit -a and I have it set to 2048 for soft limit and 4096 for hard.
Gentoo
@Gentoo Do you get an error in the accept call of the server? If so which one?
lothar
@Gentoo unfortunately you will need to know the server return value from accept. Maybe using strace on the server will shed some light on this.
lothar
@lothar so i am working on a server provided to me by a different group, in my code in the client, the following connect gives me the connection refused error <pre><code>if (connect(sock_fd, (struct sockaddr *) (void) close(sock_fd); return(-1); } </code></pre>
Gentoo
Thanks lothar. I will try doing that.
Gentoo
A: 

Is there any danger that the server opens a separate log file for each connection it accepts?

What upper limit does the other group say the server has?

There was a bit of code in one program I looked after (a good few years ago) that set the maximum file size to 1 MB. 'Twas a pity that when it was first added, it increased the size, but the passage of time and growth of file limits meant later that it was shrinking the size! Is there any possibility that the server has a similar problem - it sets the maximum number of open files to a ridiculously high number like 1024?

Jonathan Leffler
A: 

Apologies for mostly trivial questions :)
Did you recompile the server when you say "changed to poll"? Is server running under the same account? Is it a fork-ing or maybe a threaded server? Do you get errno == ECONNREFUSED after call to connect() on the client? Can you confirm you get an RST in response to the SYN with tcpdump? Do client port numbers get reused? Are there connections in TIME_WAIT state?

Nikolai N Fetissov
+1  A: 

If you are connecting faster than your server is calling accept(), the queue of pending connections may be full. The maximum queue length is set by the second argument to listen() in the server, or the value of sysctl net.core.somaxconn (normally 128) if lower.

mark4o
Thanks.. didn't know about this setting.. will check my system when i get to work...
Gentoo
A: 

Your limitation is from linux user limitation. If not specified the linux limits are to 1024 open files. To change that permanently edit /etc/security/limits.conf and add

user soft nofile 16535 user hard nofile 16535

or from console try

ulimit -n 16535

Regards

Sacx
I already have set this to 2048 for soft and 4096 for hard
Gentoo
A: 

So, after a little more research.. it looks like my server side listen is having a queue depth of 20. I am thinking thats the reason. Do any of you guys think thats the problem too?

Regards

Gentoo
Probably not, in all honesty, though it might be. The queue depth is how many outstanding (incomplete) requests are made. If you are flooding the server with connection requests before the previous ones complete, then maybe; if you are making the requests synchronously, then probably not.
Jonathan Leffler
So this is an automated workload for 2000 users and user connections are not synchronized. Thats why I think the queue depth could be the problem. Asked my server team to replace the depth and waiting to test.
Gentoo
A: 

I saw the comment you made with the close(sock_fd) statement in an error handling routine.

Are you explicitly closing your sockets after they are used - close() or shutdown().

I would guess not. You actually have 1024+ concurrent active connections? You would have to have pthreads involved in order to do this. Is that correct?

jim mcnamara