views:

1109

answers:

4

I'm trying to write some Python code that will establish an invisible relay between two TCP sockets. My current technique is to set up two threads, each one reading and subsequently writing 1kb of data at a time in a particular direction (i.e. 1 thread for A to B, 1 thread for B to A).

This works for some applications and protocols, but it isn't foolproof - sometimes particular applications will behave differently when running through this Python-based relay. Some even crash.

I think that this is because when I finish performing a read on socket A, the program running there considers its data to have already arrived at B, when in fact I - the devious man in the middle - have yet to send it to B. In a situation where B isn't ready to receive the data (whereby send() blocks for a while), we are now in a state where A believes it has successfully sent data to B, yet I am still holding the data, waiting for the send() call to execute. I think this is the cause of the difference in behaviour that I've found in some applications, while using my current relaying code. Have I missed something, or does that sound correct?

If so, my real question is: is there a way around this problem? Is it possible to only read from socket A when we know that B is ready to receive data? Or is there another technique that I can use to establish a truly 'invisible' two-way relay between [already open & established] TCP sockets?

+1  A: 

I don't think that's likely to be your problem.

In general, the sending application can't tell when the receiving application actually calls recv() to read the data: the sender's send() may have completed, but the TCP implementations in the source & destination OS will be doing buffering, flow control, retransmission, etc.

Even without your relay in the middle, the only way for A to "consider its data to have already arrived at B" is to receive a response from B saying "yep, I got it".

David Gelhar
+4  A: 

Is it possible to only read from socket A when we know that B is ready to receive data?

Sure: use select.select on both sockets A and B (if it returns saying only one of them is ready, use it on the other one), and only read from A and write to B when you know they're both ready. E.g.:

import select

def fromAtoB(A, B):
    r, w = select.select([A], [B], [])
    if not r: select.select([A], [], [])
    elif not w: select.select([], [B], [])
    B.sendall(A.recv(4096))
Alex Martelli
I've changed my code to use this, but the original problem is still there - the app I'm testing with still behaves differently with a relay in place. Also, putting this in a while loop causes Python to use a lot of CPU cycles. I should note that the data is being sent with fairly high throughput. If you have any other suggestions I'd love to hear them.
flukes1
I've fixed the CPU issue but still no dice on my original problem. Here's my code: http://pastie.org/910900
flukes1
(1) I'm surprised to hear about the high CPU consumption, since my code doesn't spend any until the data's ready -- maybe data's being sent in tiny packets, but even then, if you want to relay it very promptly, it's hard to think how to improve this (unless you're willing to add, potentially, very long latencies). (2) I'm not surprised that this has little to do with your actual problem -- I just answered the question of yours that I quoted, about reading from A only when we know B is ready for writing; behavior differences may be due to A checking its peer, which you can't fake.
Alex Martelli
Indeed; but I'm actually working with UNIX sockets and there's no way to check their legitimacy, as far as I know. I move the original socket to a safe location before creating my own fake one in its place. Truly baffled by this!
flukes1
A: 

If you're using blocking operations (not async), you'll probably stumble on many problems. My advice is to use asynchronous IO, and for that I suggest using pyevent (python bindings for libevent).

IMO, your concern makes sense if there's a very high throughput, which may cause a general slowdown. In this extraordinary situation, your application could miss packets due to full cache, and retransmission (protocol dependent) is very likely to occur. Another possibility for the same scenario, is that endpoint applications with strict timeout handling might behaviour differently.

Additionally, a common problem I see in this kind of application is a missing flush upon disconnect. For example, if endpoint A sends data and disconnects right after that, you must send this data to endpoint B prior to closing the connection with B.

jweyrich
+1  A: 

Perhaps the application you're proxying is poorly written.

For instance, if I call recv(fd, buf, 4096, 0); I'm not promised 4096 bytes. The system makes a best-effort to provide it.

If 1k isn't a multiple of your application's recv or send sizes, and the application is broken, then grouping the data sent into 1k blocks will break the app.

rescrv
I see. Is there a way around that?
flukes1
Encourage the application writer to write better software.If you're trying to make a generic proxy, then no, there isn't. If it is for a particular application, try emulating the size of send and recv used by the client and server.
rescrv