ansaurus

Question

How to correctly parse incoming HTTP requests.

Answer 1

+2 A:

You could try looking at their code to see how they handle a HTTP message.

Or you could look at the spec, there's message length fields you should use. Only buggy browsers send additional CRLFs at the end, apparently.

gbjbaanb 2010-09-13 07:21:19

The HTTPbis WG has clarified message parsing; see http://greenbytes.de/tech/webdav/draft-ietf-httpbis-p1-messaging-11.html#message.body for the current draft text.

Julian Reschke 2010-09-13 08:42:26

This looks good, thanks. If that helps i will gladly accept your answer.

PeterK 2010-09-13 09:40:37

Answer 2

+2 A:

If you're set on writing your own parser, I'd take the Zed Shaw approach: use the Ragel state machine compiler and build your parser based on that. Ragel can handle input arriving in chunks, if you're careful.

Honestly, though, I'd just use something like this.

Your go-to resource should be RFC 2616, which describes HTTP 1.1, which you can use to construct a parser. Good luck!

Jack Kelly 2010-09-13 07:28:11

+1 for the http-parser and definitive links. That source would generate ***FAST*** code, I'm really impressed. That's badass.

Matt Joiner 2010-09-13 13:20:52

Answer 3

A:

HTTP GET/HEAD requests have no body, and POST request can have no body too. You have to check if it's a GET/HEAD, if it's, then you have no content (body/message) sent. If it was a POST, do as the specs say about parsing a message of known/unknown length, as @gbjbaanb said.

aularon 2010-09-13 07:36:01

GET and HEAD request *can* have a body. So no, you don't check the method name.

Julian Reschke 2010-09-13 08:39:51

@Julian, it's not exactly specified in HTTP specification whether you can include a body or not in GET/HEAD requests. I tested it locally and it works with apache, but I never seen that before in a real world implementation, I'm reading http://stackoverflow.com/questions/978061/ and http://stackoverflow.com/questions/1266596/ now, thanks for pointing that out.

aularon 2010-09-13 10:14:10

@aularon whether something is used in practice and whether it's allowed are separate questions. What's important is that request parsing just is the same for all methods. (Contrary to response parsing where HEAD is special). See also http://trac.tools.ietf.org/wg/httpbis/trac/ticket/19 -- that's why were revising RFC 2616, after all.

Julian Reschke 2010-09-13 15:34:47

@Julian sure thing.

aularon 2010-09-13 17:21:14

Answer 4

A:

Anyway HTTP request has "\r\n\r\n" at the end of request headers and before the request data if any, even if request is "GET / HTTP/1.0\r\n\r\n".

If method is "POST" you should read as many bytes after "\r\n\r\n", as specified in Content-Length field.

So pseudocode is:

read_until(buf, "\r\n\r\n");
if(buf.starts_with("POST")
{
   contentLength = regex("^Content-Length: (\d+)$").find(buf)[1];
   read_all(buf, contentLength);
}

There will be "\r\n\r\n" after the content only if content includes it. Content may be binary data, it hasn't any terminating sequences, and the one method to get its size is use Content-Length field.

Abyx 2010-09-13 08:31:14

No, it does not depend on the method name. See http://greenbytes.de/tech/webdav/draft-ietf-httpbis-p1-messaging-11.html#message.body for details.

Julian Reschke 2010-09-13 08:41:45

Also, keep in mind that HTTP 1.1 requests do not need to use a `Content-Length` header, either. They can use `Transfer-Encoding: chunked` instead, in which case the message length is encoded inside the message data itself.

Remy Lebeau - TeamB 2010-09-13 20:05:15

ansaurus

tags:

views:

answers:

How to correctly parse incoming HTTP requests.

related questions