tags:

views:

1583

answers:

5

Is there a good way to see what format an image is, without having to read the entire file into memory?

Obviously this would vary from format to format (I'm particularly interested in TIFF files) but what sort of procedure would be useful to determine what kind of image format a file is without having to read through the entire file?

BONUS: What if the image is a Base64-encoded string? Any reliable way to infer it before decoding it?

+14  A: 

Most image file formats have unique bytes at the start. The unix file command looks at the start of the file to see what type of data it contains. See the Wikipedia article on Magic numbers in files and magicdb.org.

Greg Hewgill
Except for Targa which has magic numbers at the end, and some flavors of RAW which are entirely indistinguishable from TIFF except that they don't decode (or vice versa).
plinth
A: 

Either file on the *nix command-line or reading the initial bytes of the file. Most files come with a unique header in the first few bytes. For example, TIFF's header looks something like this:

0x00000000: 4949 2a00 0800 0000
For more information on the TIFF file format specifically if you'd like to know what those bytes stand for, go here.

verix
Yikes "something like" is dangerous. There are two valid tiff headers: 49 49 2a 00 or 4d 4d 00 2a. the 49 49 format uses Intel byte ordering (little endian) through most of the file, 4d 4d uses Motorola byte (big endian) which means that the 2a and 00 are reversed from Intel.
plinth
+1  A: 

A comprehensive site of file formats is available at:

http://www.wotsit.org

Mark Ingram
A: 

TIFFs will begin with either II or MM (Intel byte ordering or Motorolla).
The TIFF 6 specification can be downloaded here and isn't too hard to follow

hamishmcn
+2  A: 

Sure there is. Like the others have mentioned, most images start with some sort of 'Magic', which will always translate to some sort of Base64 data. The following are a couple examples:

A Bitmap will start with "Qk3".

A Jpeg will start with "/9j/".

A GIF will start with "R0l". (That's a zero as the second char).

And so on. It's not hard to take the different image types and figure out what they encode to. Just be careful, as some have more than one piece of magic, so you need to account for them in your B64 'translation code'.

LarryF