tags:

views:

136

answers:

1

I want to have different behavior in a python script, depending on the type of file. I cannot use the filename extension as it may not be present or misleading. I could call the file utility and parse the output, but I would rather use a python builtin for portability.

So is there anything in python that uses heuristics to deduce the type of the file from its contents?

+6  A: 

Probably others as well. "magic" is the magic keyword to search for. ;-)

Alex Brasetvik
`libmagic` isn't perfect for all files. It looks at the "magic number" in a file header. Text files, such as source code, don't have headers and libmagic has to resort to wild guessing ... it can be very wrong about them.
THC4k
Such is the danger of all content-sniffing approaches. Often the number of ‘acceptable’ file types is smaller than the list known by libmagic, in which case ad-hoc app-level sniffing can be a better bet, but for the general case there's not much you can do about it.
bobince
libmagic is what file uses, so it's very, very hard to find a closer match to file.
Ignacio Vazquez-Abrams