views:

1928

answers:

7

I have a python client which pushes a great deal of data through the standard library's httlib. Users are complainging that the application is slow. I suspect that this may be partly due to the HTTP client I am using.

Could I improve performance by replacing httplib with something else?

I've seen that twisted offers a HTTP client. It seems to be very basic compared to their other protocol offerings.

PyCurl might be a valid alternative, however it's use seems to be very un-pythonic, on the other hand if it's performance is really good then I can put up with a bit of un-pythonic code.

So if you have experience of better HTTP client libraries of python please tell me about it. I'd like to know what you thought of the performance relative to httplib and what you thought of the quality of implementation.

UPDATE 0: My use of httplib is actually very limited - the replacement needs to do the following:

conn = httplib.HTTPConnection(host, port)
conn.request("POST", url, params, headers)
compressedstream = StringIO.StringIO(conn.getresponse().read())

That's all: No proxies, redirection or any fancy stuff. It's plain-old HTTP. I just need to be able to do it as fast as possible.

UPDATE 1: I'm stuck with Python2.4 and I'm working on Windows 32. Please do not tell me about better ways to use httplib - I want to know about some of the alternatives to httplib.

+1  A: 

You seem to assume its the library. Its open source, so it would be worth checking the code to see if it is.

You mention that you're sending a lot of data over HTTP. The inefficieny might be because of the library, but HTTP isn't the most efficient protocol for sending large amounts of data. Then again, it could be the simple use of the library (are you sending a big string or list, or using a stream or generators?).

Richard Levasseur
Yes, but I'm also testing a bunch of other things. I'm really trying to find ut the benefits of other HTTP client-libraries. I do not maintain the server so I've got no choice other than to use HTTP.
Salim Fadhley
+1 for challenging assumptions.
Greg Hewgill
Fair enough. Your revised question is a lot better and makes that clear :)
Richard Levasseur
+15  A: 

Often when I've had performance problems with httplib, the problem hasn't been with the httplib itself, but with how I'm using it. Here are a few common pitfalls:

(1) Don't make a new TCP connection for every web request. If you are making lots of request to the same server, instead of this pattern:

    conn = httplib.HTTPConnection("www.somewhere.com")
    conn.request("GET", '/foo')
    conn = httplib.HTTPConnection("www.somewhere.com")
    conn.request("GET", '/bar')
    conn = httplib.HTTPConnection("www.somewhere.com")
    conn.request("GET", '/baz')

Do this instead:

    conn = httplib.HTTPConnection("www.somewhere.com")
    conn.request("GET", '/foo')
    conn.request("GET", '/bar')
    conn.request("GET", '/baz')

(2) Don't serialize your requests. You can use threads or asynccore or whatever you like, but if you are making multiple requests from different servers, you can improve performance by running them in parallel.

+15  A: 

Users are complainging that the application is slow. I suspect that this may be partly due to the HTTP client I am using.

Could I improve performance by replacing httplib with something else?

Do you suspect it or are you sure that that it's httplib? Profile before you do anything to improve the performance of your app.

I've found my own intuition on where time is spent is often pretty bad (given that there isn't some code kernel executed millions of times). It's really disappointing to implement something to improve performance then pull up the app and see that it made no difference.

If you're not profiling, you're shooting in the dark!

Kobold
+1 for recommending profiling.
Greg Hewgill
+1  A: 

httplib2 is another option: http://code.google.com/p/httplib2/

I have never benchmarked or profiled it in comparison to httplib, but I would also be interested in any findings there.

Corey Goldberg
It doesn't appear to be under very active development (the latest, 0.4.0 release was Oct 2007), it has some serious bugs that need fixing and a rolled up release =(.
Kurt
I think Joe G. is still maintaining it, but it does have some issues.
Corey Goldberg
+4  A: 

PyCurl is awesome, and extremely high performance.

jnoller
A: 

httplib2 is a very good option. Joe Gregorio has fixed many bugs of httplib.

karlcow
-1 for duplicate answer
Corey Goldberg
+1  A: 

As others answered httplib2 is a good alternative because it handles headers properly and can cache responses, but I doubt this would help in POST performance.

An alternative that might actually give you a performance boost for POST, especially on Windows, is the new HTTP 1.1 client in Twisted.web

Van Gale