views:

135

answers:

5

It seems that the server is limited at ~32720 sockets... I have tried every known variable change to raise up this limit. But the server stay limited at 32720 opened socket, even if there is still 4Go of free memory and 80% of idle cpu...

Here's the configuration

~# ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 63931
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 798621
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 2048
cpu time               (seconds, -t) unlimited
max user processes              (-u) 63931
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

net.netfilter.nf_conntrack_max = 999999
net.ipv4.netfilter.ip_conntrack_max = 999999
net.nf_conntrack_max = 999999

Any thoughts ?

A: 

On Gnu+Linux, maximum is what you wrote. This number is (probably) stated somewhere in networking standards. I doubt you really need so many sockets. You should optimize the way you are using sockets instead of creating dozens all the time.

skalee
Each socket is a client connected to the server...
TheSquad
No, socket is just a limited resource. Clients are using sockets. It is not true that socket = connected client or each client needs his own socket. It depends on protocol. For example, TCP needs such an association (1 socket - 1 client) but UDP does not. Even when using TCP, who said that connection must be continuous?
skalee
I meant, in our software a client = a socket...We use SSL, do UDP is out of question, and connection needs to be continuous...
TheSquad
+2  A: 

Which server are you talking about ? It might be it has a hardcoded max, or runs into other limits (max threads/out of address space etc.)

http://www.metabrew.com/article/a-million-user-comet-application-with-mochiweb-part-1 has some tuning to needed to achieve a lot of connection, but it doesn't help if the server application limits it in some way or another.

nos
I'm talking about a Core i7 16Go with 160 Go of SSD, with debian...Good article you posted by the way, not sure it will fix the issue, but good to know, i'll let you know how it goes...
TheSquad
Sorry did got what you ask at first...The server application is a software we made with no limitation
TheSquad
OK, i gotta call bullshit. A custom server app doesn't get 32k simultaneous clients unless it's made by a noteworthy company or does something shady. In the first case, you wouldn't need help -- someone who didn't understand scaling issues wouldn't have gotten hired.
cHao
Then call bullshit, I don't have to justify myself to get an answer from you...This issue is known to be tricky, and I'm not even sure that anyone as a good answer to this question (How many socket a server can handle at max...).The fact is that we have a lot more than 32K connection going on, only each servers are limited to 32K. Right now with all clustered servers, we do have more than 1 millions connection. We are looking for solutions to lower down the number of servers. That's it !
TheSquad
Umm, yeah, you *do* have to justify yourself to get an answer from me. SO doesn't pay me -- i'm here because i like solving problems. However, i'm not into helping people solve the wrong problem -- and so far, the problem seems more to be this supposed requirement for 32k+ simultaneous long-lived connections on one box, rather than a kernel and/or runtime limit that hardly anyone but stress testers even know exists. So unless i see that that's necessary, i'm going to continue to say "use fewer sockets".
cHao
@TheSquad As its software you've written yourself, are you really sure there's no limitations ? Are you using threads ? select(),poll() or epoll() ? What's the error you get when you reach 32720 sockets ? What language/API is it using ?
nos
@cHao it's not that uncommon. 32k isn't really a lot - we've had people(a small 5 man company) serving *a lot* more than that to a simple iPhone app they made.
nos
@nos: Constantly connected? I'm not saying it's unusual to serve 32k clients -- Google's or MS's web stats would make that number look positively puny -- but to have that many clients connected simultaneously, to one machine, is highly unusual in my experience.
cHao
@cHao It was using Comet, the clients stayed on about 3 minutes on average during peak hours.
nos
@nos : Yes it is using threads, pthreads... but it is not the issue.@cHao : I'm not sure you want to help more than understand what we do that involve so many clients, lol...
TheSquad
@cHao : If you have any way to use one socket for multiple clients at the same time without disconnecting them, then please advice, if not, the problem stay the same. I'm not looking for someone telling me that something is broken in the design, but for a solution... The fact that you don't believe me when I say we got a lot more than 32K clients on the server at the same time, then I can't say anything else.
TheSquad
@TheSquad I'm just asking to learn how you're getting to the limit - e.g. if you max out the address space,exhaust the process/thread id space(easy with threads) or what eventuall fails(like specific socket errors)
nos
I know nos, unfortunately, we can't test that right now, too many people on servers... I will have to test it when it is not peak hours... But normally max-thread, pid_max, fd_max and stack-size are correctly set up, we have made a stress test before and got a ±2^20 threads running on the server.
TheSquad
@nos, be sure I'll post everything you need to know as soon as we are able to test it
TheSquad
I've written a server that does nothing but accept connections til it can't anymore, and a client on the same machine that constantly connects til it can't anymore. With the sysctls from the answer applied, my only limitation was the local port range, which i'd widened to 50000 ports, but was only using from one IP (localhost). That's 50000 sockets each for server and client, meaning 100k sockets total, and it'd have been more if i cared to widen the port range more. But after the first try, things started flaking out around 27k, so i stopped.
cHao
+1  A: 

If you're considering an application where you believe you need to open thousands of sockets, you will definitely want to read about The C10k Problem. That page discusses many of the issues you will face as you scale up your number of client connections to a single server.

Greg Hewgill
The C10K problem is from 2003... With 32000 client connected the server still have great performance, it can handle much more believe me !
TheSquad
You'll have to prove that.
Greg Hewgill
Don't you think that a seven years old problem is still of actuality with today's Core I7 8Go of RAM, and 2 network of 1Go each ?Like I said in my first post, with 32720 clients connected, the cpu is still under 10% of use, and free memory is way enough to open more connection (4Go). and here's some ifstat rows eth0 KB/s in KB/s out 89.22 145.37 126.97 136.15 104.11 158.18 84.17 123.62 90.64 106.47 93.17 125.98 97.21 130.69
TheSquad
@TheSquad, aha, and most TCP stacks were written 30 years ago. Gigs of RAM have nothing to do with this, it's the client port range. You obviously have no clue, so do yourself a favor and listen to what experienced people have to say.
Nikolai N Fetissov
do yourself a favor and read this : http://www.metabrew.com/article/a-million-user-comet-application-with-mochiweb-part-1 experienced one...
TheSquad
i have pointed out the RAM because each connection are SSL and SSL session take RAMs...
TheSquad
A: 

In net/socket.c the fd is allocated in sock_alloc_fd(), which calls get_unused_fd().

Looking at linux/fs/file.c, the only limit to the number of fd's is sysctl_nr_open, which is limited to

int sysctl_nr_open_max = 1024 * 1024; /* raised later */

/// later...
sysctl_nr_open_max = min((size_t)INT_MAX, ~(size_t)0/sizeof(void *)) &
                         -BITS_PER_LONG;

and can be read using sysctl fs.nr_open which gives 1M by default here. So the fd's are probably not your problem.

edit you then probably checked this as well, but would you care to share the output of

#include <sys/time.h>
#include <sys/resource.h>
int main() {
    struct rlimit limit;
    getrlimit(RLIMIT_NOFILE,&limit);
    printf("cur: %d, max: %d\n",limit.rlim_cur,limit.rlim_max);
}

with us?

mvds
yeah, fd are fine, this was the first thing I have checked... I'm more concern about ports, but even here there should be ~32000 more ports available
TheSquad
Ports should be fine too. If you're running a server, it should be listening on one port, and all the clients would be connected to that same port number. Only a few protocols work differently -- with FTP being the only one i can come up with right off -- and that's because it uses a separate socket for data transfer.
cHao
your question was on sockets, and those don't seem to be the problem. "ports" cannot be a problem if you're the server and clients connect to you. Otherwise, it can be and you may have to increase net.ipv4.ip_local_port_range. Please be a little more specific on the situation; what exactly fails, giving what return value?
mvds
did you check rlimits? see updated answer
mvds
cHao : the in port is the same, but connection are 2 way side, the outgoing port is not the same for each client.
TheSquad
@mvds : I will... Thanks
TheSquad
@mvds : the result is 798621 for both
TheSquad
If you need one port per client, increase the port range, and use multiple ip's. There's only 64k ports in 2 bytes ;-)
mvds
@mvds : I'm aware of that, the port range has been increased, but even there we don't reach the limit of 64K only 32K...However, there is the possibility to bound the service to other local address, so not really an issue actually...
TheSquad
+2  A: 

If you're dealing with openssl and threads, go check your /proc/sys/vm/max_map_count and try to raise it.

fedj