tags:

views:

1441

answers:

5

I have a server that's written in C, and I want to write a client in python. The python client will send a string "send some_file" when it wants to send a file, followed by the file's contents, and the string "end some_file". Here is my client code :


file = sys.argv[1]
host = sys.argv[2]
port = int(sys.argv[3])
sock = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
sock.connect((host,port))
send_str = "send %s" % file
end_str = "end %s" % file
sock.send(send_str)
sock.send("\n")
sock.send(open(file).read())
sock.send("\n")
sock.send(end_str)
sock.send("\n")

The problem is this :

  • the server receives the "send some_file" string from a recv

  • at the second recv, the file's content and the "end file" strings are sent together

In the server code, the buffer's size is 4096. I first noticed this bug when trying to send a file that's less than 4096k. How can I make sure that the server receives the strings independently?

+5  A: 

With socket programming, even if you do 2 independent sends, it doesn't mean that the other side will receive them as 2 independent recvs.

One simple solution that works for both strings and binary data is to: First send the number of bytes in the message, then send the message.

Here is what you should do for each message whether it is a file or a string:

Sender side:

  • Send 4 bytes that holds the number of bytes in the following send
  • Send the actual data

Receiver side:

  • From the receiver side do a loop that blocks on a read for 4 bytes
  • Then do a block on a read for the number of characters specified in the preceding 4 bytes to get the data.

Along with the 4-byte length header I mentioned above, you could also add a constant size command type header (integer again) that describes what's in the following recv.

You could also consider using a protocol like HTTP which already does a lot of the work for you and has nice wrapper libraries.

Brian R. Bondy
how can I make sure that it won't be sent together with another string?
Geo
updated my answer with an example
Brian R. Bondy
you're right! this is the way to go! thanks
Geo
the telnetlib module implements the Telnet Protocol - consider if HTTP is not the answer.
gimel
@Geo: It is always sent as a batch of bytes -- the boundaries are determined inside TCP/IP. You read a size; then you read that many bytes. Then you read a size, etc. The TCP/IP packets and buffers will have nothing to do with the sizes you specify. It WILL be run together.
S.Lott
A: 

TCP/IP data is buffered, more-or-less randomly.

It's just a "stream" of bytes. If you want, you can read it as though it's delimited by '\n' characters. However, it is not broken into meaningful chunks; nor can it be. It must be a continuous stream of bytes.

How are you reading it in C? Are you reading up to a '\n'? Or are you simply reading everything in the buffer?

If you're reading everything in the buffer, you should see the lines buffered more-or-less randomly.

If you read up to a '\n', however, you'll see each line one at a time.

If you want this to really work, you should read http://www.w3.org/Protocols/rfc959/. This shows how to transfer files simply and reliably: use two sockets. One for the commands, the other for the data.

S.Lott
This answer is almost entirely completely wrong
Brian R. Bondy
Why? 1) FTP should not be a model for all socket programming, it is a very old protocol and there is no modern reason for having 2 sockets when 1 is fine. 2) having 2 sockets will lead to problems with NAT traversal. 3) The stuff about \n is completely wrong and has nothing to do with TCP/IP
Brian R. Bondy
You know, the part about the two sockets might be a good advice.
Geo
It is not, please read up on this SO quesiton: http://stackoverflow.com/questions/626823/in-protocol-design-why-would-you-ever-use-2-ports
Brian R. Bondy
@S.Lott: You may have perhaps used a library that could read 1 \n at a time, but this has nothing to do with TCP/IP. And how this library was implemented was to read an entire buffer at a time and to only return to you the first chars up to the \n. And buffers the rest for the next read.
Brian R. Bondy
@Brian R. Bondy: socket reading requires a buffer size, and 1 is a legitimate buffer size. The protocol stack is perfectly capable of delivering individual bytes from the incoming stream. If there are none, the read blocks. If there is data available from a packet, you get that data.
S.Lott
@S.Lott: I think I wasn't clear in my previous statement. I meant that perhaps you used a library that could read up to 1 \n at a time. Yes I know you can specify a size obviously, hence my answer. My main problem with your answer is about your view of a TCP stream as delimited by \n characters.
Brian R. Bondy
Sure you can have a \n in your TCP stream, but it is not valid to think this is required. The other problem was with advising him to implement FTP, which is not the best protocol design as a solution to a simple problem.
Brian R. Bondy
@Brian R. Bondy: The example protocol uses \n. Therefore, the stream can be seen as \n-delimited. As is true for ALL TCP/IP protocols, delimiters aren't part of TCP/IP, they're part of the protocol. The question shows a \n-based protcol, which is trivial to implement.
S.Lott
S.Lott that's what I was trying to say above.... that delimiters aren't part of TCP/IP protocols. We agree on this aspect. Please clarify your answer to indicate this as you only mention FTP later on so it is confusing.
Brian R. Bondy
In your above answer, "It's" is defined as TCP/IP. Instead of "It's" it would be better to say: A TCP stream can be viewed as...
Brian R. Bondy
Correction to my original criticism too, it should read port instead of socket. FTP uses 2 ports which there is no good reason for. 2 sockets spawned from the same port wouldn't be as bad.
Brian R. Bondy
A: 

In Socket Programming HOWTO, there is this quote and a code example:

A protocol like HTTP uses a socket for only one transfer. The client sends a request, the reads a reply. That's it. The socket is discarded. This means that a client can detect the end of the reply by receiving 0 bytes.

But if you plan to reuse your socket for further transfers, you need to realize that there is no "EOT" (End of Transfer) on a socket. I repeat: if a socket send or recv returns after handling 0 bytes, the connection has been broken. If the connection has not been broken, you may wait on a recv forever, because the socket will not tell you that there's nothing more to read (for now). Now if you think about that a bit, you'll come to realize a fundamental truth of sockets: messages must either be fixed length (yuck), or be delimited (shrug), or indicate how long they are (much better), or end by shutting down the connection. The choice is entirely yours, (but some ways are righter than others).

Assuming you don't want to end the connection, the simplest solution is a fixed length message:

class mysocket:
    '''demonstration class only 
      - coded for clarity, not efficiency'''
    def __init__(self, sock=None):
        if sock is None:
            self.sock = socket.socket(
                socket.AF_INET, socket.SOCK_STREAM)
        else:
            self.sock = sock
    def connect(self,host, port):
        self.sock.connect((host, port))
    def mysend(self,msg):
        totalsent = 0
        while totalsent < MSGLEN:
            sent = self.sock.send(msg[totalsent:])
            if sent == 0:
                raise RuntimeError, \\
                    "socket connection broken"
            totalsent = totalsent + sent
    def myreceive(self):
        msg = ''
        while len(msg) < MSGLEN:
            chunk = self.sock.recv(MSGLEN-len(msg))
            if chunk == '':
                raise RuntimeError, \\
                    "socket connection broken"
            msg = msg + chunk
        return msg
karlcow
the code you just pasted won't work.
Geo
small point about HTTP related to this message: HTTP has something called persistent connections. That means that the same conection is used for multiple request/responses. The header to change this behavior is: Connection: close
Brian R. Bondy
A: 

Possibly using

sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)

will help send each packet as you want it as this disables Nagle's algorithm, as most TCP stacks use this to join several packets of small-sized data together (and is on by default I believe)

George Shore
This will not guarantee that # of reads == # of writes though
Brian R. Bondy
Nothing guarantees that that original writes match the resulting reads. It's a single stream of bytes
S.Lott
@S.Lott: I wouldn't say nothing guarantees that. You could only send 1 byte and then close the socket, then you would be sure you have only 1 read which would give what was sent or a socket error. So in that case # of reads == # of writes.
Brian R. Bondy
+1  A: 

There are two much simpler ways I can think of in which you can solve this. Both involve some changes in the behaviors of both the client and the server.

The first is to use padding. Let's say you're sending a file. What you would do is read the file, encode this into a simpler format like Base64, then send enough space characters to fill up the rest of the 4096-byte 'chunk'. What you would do is something like this:

from cStringIO import StringIO
import base64
import socket
import sys

CHUNK_SIZE = 4096 # bytes

# Extract the socket data from the file arguments
filename = sys.argv[1]
host = sys.argv[2]
port = int(sys.argv[3])
# Make the socket
sock = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
sock.connect((host,port))
# Prepare the message to send
send_str = "send %s" % (filename,)
end_str = "end %s" % (filename,)
data = open(filename).read()
encoded_data = base64.b64encode(data)
encoded_fp = StringIO(encoded_data)
sock.send(send_str + '\n')
chunk = encoded_fp.read(CHUNK_SIZE)
while chunk:
    sock.send(chunk)
    if len(chunk) < CHUNK_SIZE:
        sock.send(' ' * (CHUNK_SIZE - len(chunk)))
    chunk = encoded_fp.read(CHUNK_SIZE)
sock.send('\n' + end_str + '\n')

This example seems a little more involved, but it will ensure that the server can keep reading data in 4096-byte chunks, and all it has to do is Base64-decode the data on the other end (a C library for which is available here. The Base64 decoder ignores the extra spaces, and the format can handle both binary and text files (what would happen, for example, if a file contained the "end filename" line? It would confuse the server).

The other approach is to prefix the sending of the file with the file's length. So for example, instead of sending send filename you might say send 4192 filename to specify that the length of the file is 4192 bytes. The client would have to build the send_str based on the length of the file (as read into the data variable in the code above), and would not need to use Base64 encoding as the server would not try to interpret any end filename syntax appearing in the body of the sent file. This is what happens in HTTP; the Content-length HTTP header is used to specify how long the sent data is. An example client might look like this:

import socket
import sys

# Extract the socket data from the file arguments
filename = sys.argv[1]
host = sys.argv[2]
port = int(sys.argv[3])
# Make the socket
sock = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
sock.connect((host,port))
# Prepare the message to send
data = open(filename).read()
send_str = "send %d %s" % (len(data), filename)
end_str = "end %s" % (filename,)
sock.send(send_str + '\n')
sock.send(data)
sock.send('\n' + end_str + '\n')

Either way, you're going to have to make changes to both the server and the client. In the end it would probably be easier to implement a rudimentary HTTP server (or to get one which has already been implemented) in C, as it seems that's what you're doing here. The encoding/padding solution is quick but creates a lot of redundantly-sent data (as Base64 typically causes a 33% increase in the quantity of data sent), the length prefix solution is also easy from the client side but may be more difficult on the server.

zvoase