tags:

views:

59

answers:

2
A: 

read(2) will return no more bytes than you asked for. This might leave more data in the stdin file descriptor for reading (in case the client sends a CONTENT_LENGTH of 0 but hands you their /dev/urandom) but that's okay. Your process is free to go away without reading it all.

read(2) may return fewer bytes than you ask for. This could be because not all the data has arrived yet and the kernel is tired of blocking, or it could be that the content is smaller than the CONTENT_LENGTH. I'm glad you're limiting the length to something 'reasonable', as it'd be pretty easy to pass in a CONTENT_LENGTH that is the maximum size_t value, or that value minus one, or that value minus two, and play games with your malloc() allocating 0, 1, or 2 bytes, and let you happily scribble all over your memory.

sarnold
How will read know it is smaller if like this tutorial says, stdin does not necessarily provide an EOF ? http://www.cs.tut.fi/~jkorpela/forms/cgic.html
bobby
@bobby, I wasn't aware of CGI's limitations w.r.t. stdin and EOF. Thanks for the pointer. (Funny enough, I've even read this page once before, years ago, the "CGI programming in C is clumsy and error-prone." sounded _very_ familiar.)
sarnold
@bobby: STDIN still behaves like a stream, and if it terminates before CONTENT_LENGTH bytes, then it will do so with EOF, that's what "terminates" means. The difficulty is what happens if the socket delivers *more* bytes than CONTENT_LENGTH says there will be - that's why CGI leaves it undefined what happens if you read more than CONTENT_LENGTH, and why it can't guarantee an EOF (e.g. if the other end just keeps writing bytes forever, STDIN might keep delivering them forever, so no EOF). Since you don't read more than CONTENT_LENGTH, you don't have to worry about the possible lack of EOF.
Steve Jessop
I should add that EOF isn't (necessarily) a character in a stream - functions like `getchar` which return EOF do so because end-of-stream has been detected, not because the socket/file/terminal/whatever has literally sent an EOF byte. Different protocols indicate that internally in different ways.
Steve Jessop
A: 

You should call read in a loop.

This is true in general, and it's especially true when reading from a socket. It's quite likely that medium-to-large amounts of data will take a while to arrive, but read typically returns as soon as possible (I can't remember whether that's required or not). If the first TCP packet contains about 1k of data, out of 9k, then the first call to read will likely return that 1k before the next packet arrives, and your current code will never read the rest.

So, keep calling read (and each time advance the pointer you pass in by the number of bytes read, and reduce the number of bytes to read likewise), until one of the following:

  • you have read CONTENT_LENGTH bytes in total.
  • read returns 0 (in which case the POST data is shorter than promised by CONTENT_LENGTH).
  • read returns -1 (indicating an error has occurred), and errno is something other than EINTR (or EAGAIN, if O_NONBLOCK is set, in which case you'll want to sleep or similar before the next read. But there's no point setting O_NONBLOCK in this case).
Steve Jessop