views:

2488

answers:

5

Hello there,

I have a quite simple problem here. I need to communicate with a lot of hosts simultaneously, but I do not really need any synchronization because each request is pretty self sufficient.

Because of that, I chose to work with asynchronous sockets, rather than spamming threads. Now I do have a little problem:

The async stuff works like a charm, but when I connect to 100 hosts, and I get 100 timeouts (timeout = 10 secs) then I wait 1000 seconds, just to find out all my connections failed.

Is there any way to also get non blocking socket connects? My socket is already set to nonBlocking, but calls to connect() are still blocking.

Reducing the timeout is not an acceptable solution.

I am doing this in Python, but I guess the programming language doesnt really matter in this case.

Do I really need to use threads?

A: 

Did you look at the asyncore module? Might be just what you need.

maksymko
i am using this, and it still blocks on connect
Tom
+3  A: 

Use the select module. This allows you to wait for I/O completion on multiple non-blocking sockets. Here's some more information on select. From the linked-to page:

In C, coding select is fairly complex. In Python, it's a piece of cake, but it's close enough to the C version that if you understand select in Python, you'll have little trouble with it in C.

ready_to_read, ready_to_write, in_error = select.select(
                  potential_readers, 
                  potential_writers, 
                  potential_errs, 
                  timeout)

You pass select three lists: the first contains all sockets that you might want to try reading; the second all the sockets you might want to try writing to, and the last (normally left empty) those that you want to check for errors. You should note that a socket can go into more than one list. The select call is blocking, but you can give it a timeout. This is generally a sensible thing to do - give it a nice long timeout (say a minute) unless you have good reason to do otherwise.

In return, you will get three lists. They have the sockets that are actually readable, writeable and in error. Each of these lists is a subset (possibly empty) of the corresponding list you passed in. And if you put a socket in more than one input list, it will only be (at most) in one output list.

If a socket is in the output readable list, you can be as-close-to-certain-as-we-ever-get-in-this-business that a recv on that socket will return something. Same idea for the writeable list. You'll be able to send something. Maybe not all you want to, but something is better than nothing. (Actually, any reasonably healthy socket will return as writeable - it just means outbound network buffer space is available.)

If you have a "server" socket, put it in the potential_readers list. If it comes out in the readable list, your accept will (almost certainly) work. If you have created a new socket to connect to someone else, put it in the potential_writers list. If it shows up in the writeable list, you have a decent chance that it has connected.

Vinay Sajip
He specifically says that he's being blocked on connect(). Select only tells you what readable or writable.
JimB
See the last para of my answer. With `select` multiplexing, you don't need to wait 1000 seconds before doing useful work. With a short timeout, you can still do useful work if all endpoints are not connected, with only a short wait. Twisted is of course an alternative, but as you said yourself "it's a bit heavy to get into".
Vinay Sajip
Ahh, I see the problem ...he set a timeout, which mean the socket *has* to be blocking.
JimB
I havent set anything explicitly, i was using the asyncore module of python, which seems to be more or less a wrapper around select(). I have created another short example testing script, just creating a socket and setting it to non blocking, but it still blocks on connecting, just not on reading.
Tom
@Tom - see, you hadn't mentioned that you were using asyncore, with the timeout option, so I my a logical guess was that you were using socket.settimeout(), which sets blocking. What is your platform and python version - connect doesn't block on my systems with setblocking(0)
JimB
A: 

You need to parallelize the connects as well, since the sockets block when you set a timeout. Alternatively, you could not set a timeout, and use the select module.

You can do this with the dispatcher class in the asyncore module. Take a look at the basic http client example. Multiple instances of that class won't block each other on connect. You can do this just as easily using threads, and I think makes tracking socket timeouts easier, but since you're already using asynchronous methods you might as well stay on the same track.

As an example, the following works on all my linux systems

import asyncore, socket

class client(asyncore.dispatcher):
    def __init__(self, host):
        self.host = host
        asyncore.dispatcher.__init__(self)
        self.create_socket(socket.AF_INET, socket.SOCK_STREAM)
        self.connect((host, 22))

    def handle_connect(self):
        print 'Connected to', self.host

    def handle_close(self):
        self.close()

    def handle_write(self):
        self.send('')

    def handle_read(self):
        print ' ', self.recv(1024)

clients = []
for i in range(50, 100):
    clients.append(client('cluster%d' % i))

asyncore.loop()

Where in cluster50 - cluster100, there are numerous machines that are unresponsive, or nonexistent. This immediately starts printing:

Connected to cluster50
  SSH-2.0-OpenSSH_4.3

Connected to cluster51
  SSH-2.0-OpenSSH_4.3

Connected to cluster52
  SSH-2.0-OpenSSH_4.3

Connected to cluster60
  SSH-2.0-OpenSSH_4.3

Connected to cluster61
  SSH-2.0-OpenSSH_4.3

...

This however does not take into account getaddrinfo, which has to block. If you're having issues resolving the dns queries, everything has to wait. You probably need to gather the dns queries separately on your own, and use the ip addresses in your async loop

If you want a bigger toolkit than asyncore, take a look at Twisted Matrix. It's a bit heavy to get into, but it is the best network programming toolkit you can get for python.

JimB
Alright, I have to apologize here. I took the code right from the Python docs, so it wasnt my code, I took it for granted it's correct. And it didn't work. It happened to me frequently that people gave me advice, which they didnt even verify themselves. I could never have guesses my OS would be the problem, instead of the code, so I thought youre just one more guy thinking hes smart and copy pasting me documentation code without even checking if it works. Sorry again for that. I threw away 3 complete versions today, wasting 6hours, to find that MacOS was the problem.
Tom
btw I tested this again together with a friend on his linux box, and even getAddrInfo doesnt seem to block there. We get an error: [Errno 115] Operation now in progress. So theoretically even asyncore with non responsive hosts could work in linux.
Tom
@Tom - np, I agree there are tons of uninformed answers around here, especially in the non-windows fields. What's worse, is the teams of uninformed end up up-voting each other, making it hard to get correct answers in.
JimB
"We get an error: [Errno 115] Operation now in progress. So theoretically even asyncore with non responsive hosts could work in linux" - I'm pretty sure it does, I just couldn't get my dns broken enough to hang in order to verify it.
JimB
+4  A: 

Use twisted.

It is an asynchronous networking engine written in Python, supporting numerous protocols, and you can add your own. It can be used to develop clients and servers. It doesn't block on connect.

nosklo
Twisted brings such happiness. I work with it every day and try to convince those struggling with concurrency that it will make their lives so much easier. Of course, my coworkers at least get to see the difference.
Dustin
I have used twisted before, its quite nice, but the documentation is twisted as well. Also it will be hard to integrate my source into that. Are you certain that it does not block on connects? Might try to go for that then.
Tom
+4  A: 

Unfortunately there are is no example code which shows the bug, so it's a bit hard to see where this block comes from.

He does something like:

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.setblocking(0)
s.connect(("www.nonexistingname.org", 80))

The socket module uses getaddrinfo internally, which is a blocking operation, especially when the hostname does not exists. A standard compliant dns client will wait some time to see if the name really does not exists or if there are just some slow dns servers involved.

The solution is to connect to ip-addresses only or use a dns client which allows non-blocking requests, like pydns.

ebo
that pretty much cuts down to the heart of the problem. Seems I am having DNS problems. The behaviour of my app (at least in the initial stage) is pretty similar to a portscanner: I am dependant on very fast results, whether the connect works or not. Using getaddrinfo on nonexistent hostnames blocks also for non blocking sockets, which is bad (for me). I might also connect to a lot of nonexistent hosts, and I cant afford waiting 10 secs on each nonexistant host.
Tom