views:

180

answers:

2

I need to be able to identify that a given file is an OOXML file based on the contents of the file, and not on the file's extension.

OOXML files are really a collection of XML and text files in a zip container, which means that I cannot use the file's magic number as it will just indicate that it is a zip file.

So what I'm really asking is are there any files that are required to be present in an OOXML Open Packaging Convention (OPC) container? If so the presence of that file in an OPC container indicates that it is likely to be an OOXML file, and the absence of that file indicates that it definitely is not an OOXML file.

This question is the OOXML version of this ODF question.

+1  A: 

A similar answer as that I gave to your ODF question - look at the technical specification of the format.

Amber
Yes it looks like I can just use the [Content_Types].xml file, but I was hoping for a definitive indication of which files will **always** be in a valid OOXML file.
jwaddell
+1  A: 

Yes, there is a way. Go to OpenXMLDeveloper.org and download the PPTX that is "02: Open XML Packages" (Presentation 02). Then, on Slide 12 it tells you how to identify an Open XML document. It is document.xml, the rels files and [Content_Types].xml file (most importantly the the ContentType element). The important thing here is to use what's inside the file, not the file structure itself (Open Packaging Convention).

Another great resource is Open XML Markup Explained. Chapter 1 and then "Setting Up the Main Document" is a great place to find out about the structure of a Word docx. Excel and PowerPoint's structures are listed later on.

Otaku