views:

658

answers:

8

I have a table with a binary column which stores files of a number of different possible filetypes (PDF, BMP, JPEG, WAV, MP3, DOC, MPEG, AVI etc.), but no columns that store either the name or the type of the original file. Is there any easy way for me to process these rows and determine the type of each file stored in the binary column? Preferably it would be a utility that only reads the file headers, so that I don't have to fully extract each file to determine its type.

Clarification: I know that the approach here involves reading just the beginning of each file. I'm looking for a good resource (aka links) that can do this for me without too much fuss. Thanks.

Also, just C#/.NET on Windows, please. I'm not using Linux and can't use Cygwin (doesn't work on Windows CE, among other reasons).

+6  A: 

This is not a complete answer, but a place to start would be a "magic numbers" library. This examines the first few bytes of a file to determine a "magic number", which is compared against a known list of them. This is (at least part) of how the file command on Linux systems works.

Paul Fisher
This will be complete enough for me if you can point me to a good library like this.
MusiGenesis
Look for /usr/share/file/magic, /etc/magic or various other similar locations on a linux or unix distro. As the other poster says, you can also get this with cygwin
ConcernedOfTunbridgeWells
-1 temporary downvote (I'll remove it in a bit). I need something for .NET, and sometimes on questions like this 1000 people will see "file" and "Linux" and upvote one answer which blocks viewers from noticing any others. nothing personal. :)
MusiGenesis
"file" does make me wish I were working on Linux. :)
MusiGenesis
+2  A: 

The easiest way I know is to use file command that it is also available in Windows with Cygwin .

Fernando Miguélez
+1  A: 

A lot of filetypes have well defined headers that begin the file. You could check the first few bytes to check to see how the file begins.

jjnguy
+1  A: 

Easiest way to do this would be through access to a *nix (or cygwin) system that has the 'file' command:

$ file visitors.*
visitors.html: HTML document text
visitors.png:  PNG image data, 5360 x 2819, 8-bit colormap, non-interlaced

You could write a C# application that piped the first X bytes of each binary column to the file command (using - as the file name)

thelsdj
+4  A: 

Someone else asked a similar question and posted the code used to do exactly this. You should be able to take what is posted here, and slightly modify it so that it pulls from your database.

http://stackoverflow.com/questions/58510

In addition to that, it looks like someone has written a library based off of magic numbers to do this, however, it looks like the site requires registration, and some form of alternate access in order to download this lirbary. The documentation is avaliable for free without registration, that may be helpful.

http://software.topcoder.com/catalog/c_component.jsp?comp=13249160&ver=2

Bob
That topcoder link does not allow download even after a rediculous registration process - don't go near!
Brendan
+8  A: 

you can use these tools to find the file format.

File Analyser http://www.softpedia.com/get/Programming/Other-Programming-Files/File-Analyzer.shtml

What Format http://www.jozy.nl/whatfmt.html

PE file format analyser http://peid.has.it/

This website may be helpful for you. http://mark0.net/onlinetrid.aspx

Note: i have included the download links to make sure that you are getting the right tool name and information.

please verify the source before you download them.

i have used a tool in the past i think it is File Analyser, which will tell you the closest match.

happy tooling.

sundar venugopal
Thank you for the links. I'm going to check them all out.
MusiGenesis
+1  A: 

You need to use some p/invoke interop code to call the SHGetFileInfo method from the Win32 API. This article may also help.

Scott Dorman
I think this will return whatever the server thinks the file type is (which will probably be ok), whereas I need a method that determines the same file type no matter what server it runs on.
MusiGenesis
Yes, it will return what the server thinks the file type is. This is the same information that you would see in Windows Explorer for the "Type" column. The only way to know on any server is to write your own parsing routine to look at file extension, PE data, and file headers.
Scott Dorman
+1  A: 

SHGetFileInfo has absolutely nothing to do here.