views:

595

answers:

5

If someone sends me a document (.pdf,.doc,.xls, ppt, .ogg, mp3, png, etc) without the extension, how can I determine the file type? The /usr/bin/file command doesn't always guess right or it simply says that I have a Microsoft Office document. I would like to know exactly so I can add the extension to the file name.

+3  A: 

Try mimetype(1).

For Perl, look at File::MimeInfo.

Can Berk Güder
+1  A: 

use this in perl: http://search.cpan.org/dist/File-Type/

dusoft
http://search.cpan.org/perldoc?File::Type
Brad Gilbert
+6  A: 

You can come up with your own rules by adding them to /etc/magic

man file for more details. It is tricky to always get these correct however, I have had reasonable success.

Paul Whelan
+1  A: 

Some of the other posters thus far appear to neglect a few things.

File::MimeInfo uses the same MimeInfo database used by 'file' to identify files. So That's unlikely to do anything different.

File::Type is likely to be interesting though, as it relies only on itself, but this leads to a comically long script full of 'if' statements. But this is, by its very nature, unlikely to cover things 'file' already doesn't cover.

The best you can do with unknown filetypes is try cracking them open with a hex-editor, or running them through 'strings' and seeing if you recognise anything. If you manage how to Identify a file, you may wish to go for File::Type as your solution because as far as I can make out, its at least easy to extend.

Kent Fredric
+1  A: 

I answered this for an ASP user a while back and it seems to do the trick for him (well he marked me as the correct answer if that's anything to go by):

http://stackoverflow.com/questions/450947/find-out-the-real-file-type/450972#450972

One way would be to check for certain signatures or magic numbers in the files. This page has a handy list of known file signatures and seems quite up to date:

http://www.garykessler.net/library/file_sigs.html

Kev

Kev