views:

94

answers:

4

I am working on a document that requires a user to upload a Microsoft Word Documents.

Apart from checking the file extension to be .doc or .docx, is there any other way i can verify that the uploaded file is actually a Microsoft Word Document and not any other file renamed to a .doc or .docx extension.

Thanks in advance.

+4  A: 

.docx is a set of XML files that have been compressed using the standard zip compression scheme. So you could try passing it to an unzip algorithm and seeing if it decompresses, and then attempt to look at the proper xml file within and check for fields that one would expect to find in a document.

Amber
+2  A: 

You could try:

$type = `file -bi $UploadedFilePath`;

That would launch the linux file program which will look into the file contents and detect which file type it is.

It works with many file types (and we use this in production code to detect uploaded files), not sure on Microsoft Word document versions though.

Patonza
+5  A: 

If your are not using PHP 5.3, the mime_content_type function might interest you.

If you are using PHP 5.3 and/or can install PECL extensions, the new Fileinfo library should do the job ; see finfo_file for more informations.
In the given example, one of the identified mime types is "application/vnd.ms-excel" ; so, with a bit of luck, it should be able to deal with MS Word files too ;-)

Pascal MARTIN
+2  A: 

For Microsoft .doc files you can check the first few bytes of the file for the magic number:

D0 CF 11 E0 A1 B1 1A E1

and "subheaders" at byte offset 512.

Mark Byers