views:

123

answers:

3

Hi,

I have to implement an HTTP client in Java and for my needs it seems that the most efficient way to do it, is implement HTTP pipeline (as per RFC2616).

As an aside, I want to pipeline POSTs. (Also I am not talking about multiplexing. I am talking about pipelining i.e. many requests over one connection before receiving any response- batching of HTTP requests)

I could not find a third party library that explicitly states it supports pipelining. But I could use e.g. Apache HTTPCore to build such a client, or if I have to, build it by myself.

The problem I have is if it is a good idea. I have not found any authoritative references that HTTP pipelining is something more than a theoretical model and is properly implemented by HTTP servers. Additionally all browsers that support pipelining have this feature off by default.

So, should I try to implement such a client or I will be in a lot of trouble due to server's implementations (or proxies). Is there any reference that gives guidelines on these?

If it is a bad idea what would be the alternative programming model for efficiency? Separate TCP connections?

+3  A: 

POST should not be pipelined

8.1.2.2 Pipelining

A client that supports persistent connections MAY "pipeline" its requests (i.e., send multiple requests without waiting for each response). A server MUST send its responses to those requests in the same order that the requests were received.

Clients which assume persistent connections and pipeline immediately after connection establishment SHOULD be prepared to retry their connection if the first pipelined attempt fails. If a client does such a retry, it MUST NOT pipeline before it knows the connection is persistent. Clients MUST also be prepared to resend their requests if the server closes the connection before sending all of the corresponding responses.

Clients SHOULD NOT pipeline requests using non-idempotent methods or non-idempotent sequences of methods (see section 9.1.2). Otherwise, a premature termination of the transport connection could lead to indeterminate results. A client wishing to send a non-idempotent request SHOULD wait to send that request until it has received the response status for the previous request.

http://www.w3.org/Protocols/rfc2616/rfc2616-sec8.html

arjan
Thanks for the reply. But SHOULD NOT means: "there may exist valid reasons in particular circumstances when the particular behavior is acceptable or even useful" per rfc 2119. This is one of this cases. Unless there is an implication in the SHOULD NOT definition I am failing to undertand
+4  A: 

I've implemented a pipelined HTTP client. The basic concept sounds easy but error handling is very hard. The performance gain is so insignificant that we gave up on the concepts long time ago.

In my opinion, it doesn't make sense to normal use-case. It only has some benefits when the requests have logic connections. For example, you have a 3-requests transaction and you can send them all in a batch. But normally you can combine them into one request if they can be pipelined.

Following are just some hurdles I can remember,

  1. TCP's keepalive is not guaranteed persistent connection. If you have 3 requests piped in the connection, server drops connection after first response. You supposed to retry the next two requests.

  2. When you have multiple connections, load balance is also tricky. If no idle connection, you can either use a busy connection or create a new one.

  3. Timeout is also tricky. When one request times out, you have to discard all after it because they must come back in order.

ZZ Coder
@ZZ Coder Thank you! In you client did you pipeline POSTs also? My case is not normal. I want to pipeline real time POSTs that trigger actions in a call center. Any info you might remember, especially about servers/proxies behavior is appreciated!
Yes. It handles POST. There is no difference except that you have to remember body if you implement retry logic.
ZZ Coder
A: 

pipelining makes almost no difference to http servers; they usually process requests in a connection serially anyway - read a request, write a response, then reads the next request...

but client would very likely improve throughput by multiplexing. websites usually have multiple machines with multiple cpus, why do you want to voluntarily limit your requests into a single line? today it's more about horizontal scalability (concurrent requests). of course, it's best to benchmark it.

irreputable
@irreputable In pipelining, at least per definition, the interaction is not serial, since the requests come in batches. Also what if there is a limitation on the number of open connections to the same server?