views:

200

answers:

4

So, lets say I'm writing a web server and I want to support "very large" file uploads. Lets further assume that I mean to do this via the standard multipart/form-data MIME type. I should say that I'm using erlang and that I plan to collect http packets as they are returned from erlang:decode_packet/2, but I do not want to actually collect the request body until the http request handler has found place for the uploaded content to go. Should I

a) go-ahead and collect the body anyway, ignoring the possibility of its being very very large and thus possibly crashing the server due to its running out of memory?

b) refrain from receiving on the socket any (possibly non-existent) request body until after the headers have been processed?

c) do something else?

An example for answer c might be: spawn another process to collect and write the uploaded content to a temporary location (in order to minimize memory use), while simultaneously giving that location to the http request handler for future processing. But I just don't know - is there a standard technique here?

A: 

In my implementation I uses your example for answer c - I read from socket chunk by chunk and store chunks to temporary file. Also, afaik yaws uses simillar technique - you can see it at yaws/src/yaws_multipart.erl

W55tKQbuRu28Q4xv
A: 

Storing to a temporary file is also the way PHP does things, so it's a tried and tested way. You could count the bytes received and disconnect if it reaches a size that makes no sense.

Tor Valamo
A: 

A nice article in which they discuss Multipart uploads in MochiWeb is available here:

http://joefreeman.co.uk/blog/2009/12/handling-multipart-uploads-with-mochiweb/

Roberto Aloi
+2  A: 

In my opinion option b is clearly the superior one.

During the period of time that you are not reading the socket, the TCP code will continue to buffer the incoming data within the kernel. As it does so, it will advertise a smaller and smaller TCP window size to the HTTP server, until eventually (when the TCP receive buffers in the kernel are full), the TCP window will close.

In other words, by not reading the socket, you are allowing TCP flow-control do its job.

Cayle Spandon
I was secretly looking for justification for doing b, thanks for helping to provide it. From me, it makes better sense from a code maintenance, but this wasn't enough for me to implement it.
Aoriste