views:

1066

answers:

2

I am testing cogen on a Mac OS X 10.5 box using python 2.6.1. I have a simple echo server and client-pumper that creates 10,000 client connections as a test. 1000, 5000, etc. all work splendidly. However at around 10,000 connections, the server starts dropping random clients - the clients see 'connection reset by peer'.

Is there some basic-networking background knowledge I'm missing here?

Note that my system is configured to handle open files (launchctl limit, sysctl (maxfiles, etc.), and ulimit -n are all valid; been there, done that). Also, I've verified that cogen is picking to use kqueue under the covers.

If I add a slight delay to the client-connect() calls everything works great. Thus, my question is, why would a server under stress drop other clients when there's a high frequency of connections in a short period of time? Anyone else ever run into this?

For completeness' sake, here's my code.

Here is the server:

# echoserver.py

from cogen.core import sockets, schedulers, proactors
from cogen.core.coroutines import coroutine
import sys, socket

port = 1200

@coroutine
def server():
    srv = sockets.Socket()
    srv.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    addr = ('0.0.0.0', port)
    srv.bind(addr)
    srv.listen(64)
    print "Listening on", addr
    while 1:
        conn, addr = yield srv.accept()
        m.add(handler, args=(conn, addr))

client_count = 0

@coroutine
def handler(sock, addr):
    global client_count
    client_count += 1
    print "SERVER: [connect] clients=%d" % client_count
    fh = sock.makefile()
    yield fh.write("WELCOME TO (modified) ECHO SERVER !\r\n")
    yield fh.flush()
    try:
        while 1:
            line = yield fh.readline(1024)
            #print `line`
            if line.strip() == 'exit':
                yield fh.write("GOOD BYE")
                yield fh.close()
                raise sockets.ConnectionClosed('goodbye')
            yield fh.write(line)
            yield fh.flush()
    except sockets.ConnectionClosed:
        pass
    fh.close()
    sock.close()
    client_count -= 1
    print "SERVER: [disconnect] clients=%d" % client_count

m = schedulers.Scheduler()
m.add(server)
m.run()

And here is the client:

# echoc.py

import sys, os, traceback, socket, time
from cogen.common import *
from cogen.core import sockets

port, conn_count = 1200, 10000
clients = 0

@coroutine
def client(num):
    sock = sockets.Socket()
    sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    reader = None
    try:
        try:
            # remove this sleep and we start to see 
            # 'connection reset by peer' errors
            time.sleep(0.001)
            yield sock.connect(("127.0.0.1", port))
        except Exception:
            print 'Error in client # ', num
            traceback.print_exc()
            return
        global clients
        clients += 1
        print "CLIENT #=%d [connect] clients=%d" % (num,clients)
        reader = sock.makefile('r')
        while 1:
            line = yield reader.readline(1024)
    except sockets.ConnectionClosed:
        pass
    except:
        print "CLIENT #=%d got some other error" % num
    finally:
        if reader: reader.close()
        sock.close()
        clients -= 1
        print "CLIENT #=%d [disconnect] clients=%d" % (num,clients)

m = Scheduler()
for i in range(0, conn_count):
    m.add(client, args=(i,))
m.run()

Thanks for any information!

+2  A: 

Python's socket I/O sometimes suffers from connection reset by peer. It has to do with the Global Interpreter Lock and how threads are scheduled. I blogged some references on the subject.

The time.sleep(0.0001) appears to be the recommended solution because it adjusts thread scheduling and allows the socket I/O to finish.

S.Lott
So what happens when I have clients in the wild that can connect at any time? Granted, I'm not *expecting* 1000 connection attempts per second but there's no way I'm going into production without understanding this 100%. I suppose I could rate limit connections with iptables.
z8000
"at around 10,000 connections, the server starts dropping random clients" More than that, I doubt your server can actually do much with 10,000 concurrent connections. Can you really open and serve 10,000 files? Can you really do 10,000 DB queries?
S.Lott
S.Lott: why do you doubt that? I haven't disclosed anything about my server. FWIW, I'm not serving files but nginx wouldn't bat an eye at that number. Also, ejabbered can easily route and relay 100K+ connections (XMPP). My server is a game server and does little to no I/O (disk, database, or other).
z8000
@2pence: since your question disclosed nothing, I was forced to assume. Since I was forced to assume, there was no possible way for me to assume correctly. Please update your question with new facts.
S.Lott
S.Lott - Not sure why you chimed in at all then. Don't assume. I didn't need to disclose my server's end-goals. I merely had a question about part of the server.
z8000
@2pence: Sorry. Your first comment indicated that you had additional questions, but it was missing some facts. That's why I chimed in. I apologize for trying to understand your comment. If it was meant to be opaque, I missed the point.
S.Lott
okie doke - nevermind!
z8000
A: 

see comments of my own post.

z8000