Is this possible to do? I'm currently coding in PHP using cURL library but this rather applies to HTTP as a whole.
The most obvious way sounded like throwing a HEAD request to the data URL and read its Content-Length header, but the problem is that some servers including apache 2.0 does not send Content-Length against HEAD requests and since its not mandatory, there is no guarantee that all servers out there will reply with such information even on GET request.
I'm making the server download web pages specified by user input and store it on the server, but I do not want to let it download any requests only to find the file too large to be discarded after everything is downloaded to choke on the bandwidth from malicious requests. So I want to know the size of the content before the data is actually transfered, and reliably.
Cases of malicious web servers sending wrong Content-Length and those minor weird occasions do not concern me, if it works for all of the rest of general cases.
The worst idea so far in my mind is to actually just download the content with GET request and just drop the connection if it exceeds the size limit specified during the transfer, but this sounds like a very ugly solution on such a general protocol as HTTP.
Does anyone have any better ideas?