views:

74

answers:

4

Using a Java servlet, is it possible to detect the true file type of a file, regardless of its extension?

Scenario: You only allow plain text file uploads (.txt and .csv) The user takes the file, mypicture.jpg, renames it to mypicture.txt and proceeds to upload the file. Your servlet expects only text files and blows up trying to read the jpg.

Obviously this is user error, but is there a way to detect that its not plain text and not proceed?

+3  A: 

No. There is no way to know what type of file you are being uploaded. You must make all verifications on the server before taking any actions with the file.

Andrey
+4  A: 

You can do this using the builtin URLConnection#guessContentTypeFromStream() API. It's however pretty limited in content types it can detect, you can then better use a 3rd party library like jMimeMagic.

See also:

BalusC
A: 

What exactly do you mean by "plain text file"? Would a file consisting of Chinese text be a plain text file? If you assume English text in ASCII or ANSI coding, you would have to read the full file as binary file, and check that e. g. all byte values are between, say, 32 and 127 plus 13, 10, 9, maybe.

Frank
+1  A: 

I think you should consider why your program might blow up when give a JPEG (say) and make it defensive against this. For example a JPEG file is likely to have apparently very long lines (any LF of CR LF will be soemwhat randomly spread). But a so called text file could equally have long lines that might kill your program,

justintime