I am testing on a local Linux server with both the server and client in the same server. After about 1024 connections, in my code, where I connect, I get connection refused. At first I thought it was the fd_set_max limit of 1024 for select and changed the server to do poll instead of select and I still don't get past this number. My ulimit -n is set to 2048 and I monitor the lsof on the server it reaches about 1033 (not sure if this is exact number) and fails. Any help is much appreciated.
Maybe you reached your process limit for open file descriptors.
I'm not sure if I understand you correctly: Do you have both the server side and the client side in the same process? Then you will use twice as much file descriptors. That comes close to what you see with ulimit. If that is not the case could the problem be on the server side? Maybe the server process runs out of descriptors and can no longer accept any more connections.
The accept man page mentions that you should get a return value of:
EMFILE
The per-process limit of open file descriptors has been reached.ENFILE
The system limit on the total number of open files has been reached.
What error code do you get? You can obviously only add connections that were successfully _accept_ed into select or poll.
I know you already know how to check ulimit, but others may not:
ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 40448
max locked memory (kbytes, -l) 32
max memory size (kbytes, -m) unlimited
open files (-n) 4096
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 40448
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Is there any danger that the server opens a separate log file for each connection it accepts?
What upper limit does the other group say the server has?
There was a bit of code in one program I looked after (a good few years ago) that set the maximum file size to 1 MB. 'Twas a pity that when it was first added, it increased the size, but the passage of time and growth of file limits meant later that it was shrinking the size! Is there any possibility that the server has a similar problem - it sets the maximum number of open files to a ridiculously high number like 1024?
Apologies for mostly trivial questions :)
Did you recompile the server when you say "changed to poll"? Is server running under the same account? Is it a fork
-ing or maybe a threaded server? Do you get errno == ECONNREFUSED
after call to connect()
on the client? Can you confirm you get an RST
in response to the SYN
with tcpdump
? Do client port numbers get reused? Are there connections in TIME_WAIT
state?
If you are connecting faster than your server is calling accept()
, the queue of pending connections may be full. The maximum queue length is set by the second argument to listen()
in the server, or the value of sysctl net.core.somaxconn
(normally 128) if lower.
Your limitation is from linux user limitation. If not specified the linux limits are to 1024 open files. To change that permanently edit /etc/security/limits.conf and add
user soft nofile 16535 user hard nofile 16535
or from console try
ulimit -n 16535
Regards
So, after a little more research.. it looks like my server side listen is having a queue depth of 20. I am thinking thats the reason. Do any of you guys think thats the problem too?
Regards
I saw the comment you made with the close(sock_fd) statement in an error handling routine.
Are you explicitly closing your sockets after they are used - close() or shutdown().
I would guess not. You actually have 1024+ concurrent active connections? You would have to have pthreads involved in order to do this. Is that correct?