tags:

views:

1427

answers:

7

I am currently using PIL.

import Image
try:
    im=Image.open(filename)
    # do stuff
except IOError:
    # filename not an image file

However, while this sufficiently covers most cases, some image files like, xcf, svg and psd are not being detected. Psd files throws an OverflowError exception.

Is there someway I could include them as well?

+1  A: 

Would checking the file extensions be acceptable or are you trying to confirm the data itself represents an image file?

If you can check the file extension a regular expression or a simple comparison could satisfy the requirement.

Doomspork
simply checking extension wont suffice, as one can rename a txt file as jpg or something. i guess, if i can find no solution, only then i will use extension checking for xcf and svg
Sujoy
Understandable, I was just hoping for some clarification before I proceeded to devise a solution that might better suit your needs. Thanks!
Doomspork
+4  A: 

A lot of times the first couple chars will be a magic number for various file formats. You could check for this in addition to your exception checking above.

Brian R. Bondy
That won't be sufficient if he's really testing for "valid" images; the presence of a magic number doesn't guarantee that the file hasn't been truncated, for example.
Ben Blank
excellent advice, now i just need to figure out what those numbers are. thanks :)
Sujoy
@ben, ouch i didnt think of that yet. thats a good point indeed
Sujoy
@Ben, how would you expect a library to infer a file has been truncated?
@Ben Blank: True, but solving a problem 99% of the way is often better then not solving it at all.
Brian R. Bondy
A: 

Well, I do not know about the insides of psd, but I, sure, know that, as a matter of fact, svg is not an image file per se, -- it is based on xml, so it is, essentially, a plain text file.

shylent
aha, you are right. it is xml. however, it contains some image data embedded in it.
Sujoy
+1  A: 

On Linux, you could use python-magic (http://pypi.python.org/pypi/python-magic/0.1) which uses libmagic to identify file formats.

AFAIK, libmagic looks into the file and tries to tell you more about it than just the format, like bitmap dimensions, format version etc.. So you might see this as a superficial test for "validity".

For other definitions of "valid" you might have to write your own tests.

fmarc
+2  A: 

In addition to what Brian is suggesting you could use PIL's verify method to check if the file is broken.

im.verify()

Attempts to determine if the file is broken, without actually decoding the image data. If this method finds any problems, it raises suitable exceptions. This method only works on a newly opened image; if the image has already been loaded, the result is undefined. Also, if you need to load the image after using this method, you must reopen the image file. Attributes

Nadia Alramli
well the main problem is that svg,xcf and psd files cannot be opened with Image.open() hence, no chance of verifying with im.verify()
Sujoy
+1  A: 

You could use the Python bindings to libmagic, python-magic and then check the mime types. This won't tell you if the files are corrupted or intact but it should be able to determine what type of image it is.

Kamil Kisiel
+1  A: 

I have just found the builtin imghdr module. From python documentation:

The imghdr module determines the type of image contained in a file or byte stream.

This is how it works:

>>> import imghdr
>>> imghdr.what('/tmp/bass')
'gif'

Using a module is much better than reimplementing similar functionality

Nadia Alramli
yes imghdr works for most image formats but not all. as per my original problem with svg, xcf and psd files, well those are undetected in imghdr as well
Sujoy
Yes, but instead of reinventing the wheel there is something to start with.
Nadia Alramli
You can for example refuse undetected image headers. If the image was not detected by imghdr is is probably not supported by PIL either. Or you can start by looking at the imghdr source code and see how it works.
Nadia Alramli