views:

477

answers:

3

I believe with JPGs, the width and height information is stored within the first few bytes. What's the easiest way to get this information given an absolute URI?

+1  A: 

I am a bit rusty at this, but with jpeg, it might not be as simple as it seems. Jpeg has a header within each segment of data which has its own height / width and resolution. jpeg was not designed to be streamed easily. You might need to read the entire image to find the width and height of each segment within the jpeg to get the entire width and height.

If you absolutely need to stream an image consider changing to another format which is easier to stream, jpeg is going to be tough.

You could do it if you can develop a server side program that would seek forward and read the header of each segment to compute the width and height of the segment.

Andrew Keith
Oh, I don't have a choice on image format or any server control (otherwise I'd store the width/height in a DB, or get the OS to read it for me). I'm mining data and want to save on download times/bandwidth. Only interested in large images :)
Mark
And don't web browsers stream JPGs all the time? I'm pretty sure my browser knows what size the image is before its completely finished downloading.
Mark
i am not sure. Try searching if its possible to request the width and height from the web server itself. The web server might be telling the browser the size of the image.
Andrew Keith
"Only interested in large images" perhaps in that case you can do a HTTP HEAD on the image files and see how large the image file is (in bytes not pixels though) and use that as an indicator?
Matthew Lock
Yeah... that might be accurate enough for me. When I create a web request like this http://www.java2s.com/Code/CSharp/Network/GetHTTPRequestHeaders.htm it's *only* getting the headers right? Not the whole file?
Mark
And actually, about the browser knowing beforehand, I guess that's not really true... not unless the JPG is interlaced, I suspect. I find this kind of strange though, why weren't JPGs created with the web in mind? Would prevent content from being pushed around so much when people don't specify the width/height in the HTML.
Mark
Nvm. This http://stackoverflow.com/questions/122853/c-get-http-file-size answers my other question. Thanks! :D
Mark
+1  A: 

It's a bit Heath Robinson, but since browsers seem to be able to do it perhaps you could automate IE to download the image within a webpage and then interrogate the browser's DOM to reveal the dimensions before the image has finished downloading?

Matthew Lock
That's a bit ridiculous ;)
Mark
You won't be able to beat the browser's ability to get the image dimensions with the minimum download data though from all manner of weird jpegs.
Matthew Lock
+4  A: 

First, you can request the first hundred bytes of an image using the Range header.

HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
request.Headers.Set(HttpRequestHeader.UserAgent, "Range: bytes=0-100");

Next, you need to decode. The unix file command has a table of common formats, and the locations of key information. I'd suggest installing Cygwin and taking a look at /usr/share/file/magic.

For gif's and png's, you can easily get the image dimensions from the first 32 bytes. However, for JPEGs, @Andrew is correct in that you can't reliably get this information. You can figure out if it has a thumbnail, and the size of the thumbnail.

The get the actual jpeg size, you need to scan for the start of frame tag. Unfortunately, one can't reliably determine where this is going to be in advance, and a thumbnail could push it several thousand bytes into the file.

I'd recommend using the range request to get the first 32 bytes. This will let you determine the file type. After which, if it's a JPEG, then download the whole file, and use a library to get the size information.

brianegge
I think 99% of the images are going to be JPGs... seems like a lot of work to handle the 1% case. Good suggestion though, nice to know that you can do that. I guess what I can do is examine the file size, if it's a minimum size, download the whole file into memory, then check it's real image dimensions, then discard it if it's too small.
Mark
I guess you deserve the check for answering the *question* most thoroughly though. However, Matthew Lock answered the *problem* in his comment :)
Mark