views:

1088

answers:

4

I'm working on a client-server program for the first time, and I'm feeling woefully inadequate on where to begin for what I'm doing.

I'm going to use Google Protocol Buffers to transfer binary data between my client and my server. I'm going to be using the Python variant. The basic idea, as I understand, is that the client will serialize the data, send it to the server, which will then deserialize the data.

The problem is, I'm really not sure where to begin for sending binary data to the server. I was hoping it'd be something "simple" like an HTTP request, but I've been searching around Google for ways to transfer binary data and getting lost in the vast multitude of tutorials, guides and documentation. I can't even tell if I'm barking up the wrong tree by investigating HTTP transfers (I was hoping to use it, so I could knock it up a notch to HTTPS if security is necessary). I really don't want to have to go to the level of socket programming, though - I'd like to use the libraries available before turning to that. (I'd also prefer standard Python libraries, though if there's the perfect 3rd party library I'll live.)

So, if anyone has a good starting point (or wants to explain a bit themselves) on how a good way to transfer binary data via Python, I'd be grateful. The server I'm running is currently running Apache with mod_python, by the way.

+4  A: 

Any time you're going to move binary data from one system to another there a couple of things to keep in mind.

Different machines store the same information differently. This has implication both in memory and on the network. More info here (http://en.wikipedia.org/wiki/Endianness)

Because you're using python you can cut yourself some slack here (assuming the client and server will both by in python) and just use cPickle to serialize your data. If you really want binary, you're going to have to get comfortable with python's struct module (http://docs.python.org/library/struct.html). And learn how to pack/unpack your data.

I would first start out with simple line-protocol servers until you get past the difficulty of network communication. If you've never done it before it can get confusing very fast. How to issue commands, how to pass data, how to re-sync on errors etc...

If you already know the basics of client/server protocol design, then practice packing and unpacking binary structures on your disk first. I also refer to the RFCs of HTTP and FTP for cases like this.

-------EDIT BASED ON COMMENT-------- Normally this sort of thing is done by sending the server a "header" that contains a checksum for the file as well as the size of the file in bytes. Note that I don't mean an HTTP header, you can customize it however you want. The chain of events needs to go something like this...

CLIENT: "UPLOAD acbd18db4cc2f85cedef654fccc4a4d8 253521"
SERVER: "OK"
(server splits the text line to get the command, checksum, and size)
CLIENT: "010101101010101100010101010etc..." (up to 253521 bytes)
(server reasembles all received data into a file, then checksums it to make sure it matches the original)
SERVER: "YEP GOT IT"
CLIENT: "COOL CYA"

This is overly simplified, but I hope you can see what I'm talking about here.

Trey Stout
I've already got the data (let's say in a file) - the question really is, how do I transfer the file? (I suppose I made it more complicated-sounding than I should have by talking about binary data, but file transfer is what I meant.)
+3  A: 

I'm not sure I got your question right, but maybe you can take a look at the twisted project.

As you can see in the FAQ, "Twisted is a networking engine written in Python, supporting numerous protocols. It contains a web server, numerous chat clients, chat servers, mail servers, and more. Twisted is made up of a number of sub-projects which can be accessed individually[...]".

The documentation is pretty good, and there are lots of examples on the internet. Hope it helps.

Renato Besen
+1  A: 

I guess it depends on how tied you are to Google Protocol Buffers, but you might like to check out Thrift.

Thrift is a software framework for scalable cross-language services development. It combines a software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, Smalltalk, and OCaml.

There's a great example for getting started on their home page.

Craz
A: 

One quick question: why binary? Is the payload itself binary, or do you just prefer a binary format? If former, it's possible to use base64 encoding with JSON or XML too; it does use more space (~34%), and bit more processing overhead, but not necessarily enough to matter for many use cases.

StaxMan