tags:

views:

96

answers:

2

I'm trying to write some code to extract Exif information from a JPG.

Exif is stored in the APP1 segment of a JPG file. According to the Exif spec, the format of the APP1 segment is supposed to start like this:

FF E1        // APP1 segment marker
nn nn        // Length of segment
45           // 'E'
78           // 'x'
69           // 'i'
66           // 'f'

And it goes until there is an FF followed by something other than FF or 00.

Looking at a JPG in a hex editor, I can see FF E1 and the Exif string, but I'm having trouble decoding the length bytes. An example: In one jpg, my hex editor tells me the APP1 segment is 686 bytes long, but the length bytes are F7 C8.

How should I use those bytes to come up with 686 decimal?

Edit: Here is the first part of the example file:

FF D8 FF E1 F7 C8 45 78 69 66 00 00 4D 4D 00 2A 00 00 00 08

Edit: Actually, I think I might know what's going on here. Does the APP1 segment actually "contain" other segments? For example, if the thumbnail data were considered to be inside APP1, then that length seems more reasonable. Can anyone confirm/deny this?

+1  A: 

You can't. Those bytes don't represent 686, but the length bytes do preceed the "Exif" string.

Without seeing the file, it's impossible to diagnose. It might be good to post the first 20 hex bytes of your file.

Matthias Wandel
Done, please take a look at my edits.
Blorgbeard
+1  A: 

It turns out that the APP1 segment includes the thumbnail (see the linked EXIF document and scroll down to logical page 12), so the 686 is a red herring (probably the number of bytes until the thumbnail). F7C8 is the actual number of bytes until the DQT segment, and it's so big because it includes a thumbnail.

Gabe