views:

70

answers:

1

I am attempting to detect the file type of a library of files on our webserver as we are implementing code that is designed to stream files to the browser securely. Previously, the files were being stored and presented to users via a direct href.

I have attempted to do this 3 different ways, all on my local machine (which is NOT a simulated production environment):

  1. Setting a variable to be the value of what is returned from the function getPageContext().getServletContext().getMimeType(). This detects some but not all mime types for files.

  2. Creating an object from coldfusion.util.MimeTypeUtils and calling function guessMimeType(). This also detects some but not all mime types for files.

  3. A cffile action="read" on files in the library. This is the solution my boss recommended, as he has used this on files with cffile action="upload" from a form (and says it works), but when I use it, the cffile structure is always blank.

Ideally, I want to retrieve the mime type of every file located on the server with 100% accuracy. The code I have written has detected approximately 99% of the files on my copy of the repo, leaving about 30 that it can't identify. Included in these are MS office files with the new -x extension, and tgz compressed files.

I am wondering if there is there a sure-fire way to detect the mime-types of any given file that exists on a server by using CF code to look at it, and will the code that's being used work on a production server where very few applications are installed? It is my understanding that the first function I referenced uses the mime-type library of the OS, and the 2nd uses a predetermined list in the java object for mime-types. Searching on Google and SO has not produced anything that tells me that CF can accurately detect file mime types on it's own, nor have I seen anything that says this can't be done.

Edit: This is on a CF8 environment.

+2  A: 

There will not be a 100% guaranteed sure-fire way because mime types are arbitrary mappings.

You will need to use somebody's mappings, whether its the OS or the JVM.

It will be your responsibility to fill in any blanks that either the OS or the JVM has in mappings, and keep that up to date.

But, I will always be able to create some file, give it an extension of .xyzzy, and you'll not be able to find out the 'mime-type' of it.

Edward M Smith
This only seems half right. I'm suspecting that mime-types aren't stored in files, but arbitrary assignment doesn't sound correct either. Sure, you can create your own random file extension, but if you wanted to programmatically use it, there's got to be a way to associate your file type with your application beyond setting up the extension and mime-type in your OS's mime-type db/list.
KeRiCr
Perhaps 'arbitrary' was the wrong word. I meant it in the sense of not being programatic. You can't look at a file you've never seen before and determine the proper mime type. The canonical registration of mime types is here: http://www.iana.org/assignments/media-types/. The method of determining mime type is generally by looking up the file extension or the magic number (http://en.wikipedia.org/wiki/Magic_number_(programming)#Magic_numbers_in_files) against a database of registered types.
Edward M Smith
Ok, I understand now. Seeing how it's external metadata, I see how it's impossible to get them all the time. I can deal with manually entering a small handful of mime-types and I can implement validation to either only accept currently detectable mime-types or in the case of the unknown MS Office files, just programmatically assign the correct mimetype if the browser can't. Thanks for the help.
KeRiCr