views:

20

answers:

1

I'm opening word documents with the Documents.Open method in the Microsoft.Office.Interop.Word Namespace. This works fine, except that when I open a file that isn't a word document, it's automatically converted to be a word document. I'd like to find a way to either raise an exception of the document isn't a word document, detect if the document is, or is not, a word document before opening it, or detect if the document was converted after it is opened and converted. Does anyone have any ideas about how to accomplish this?

+2  A: 

A simple test would be to check for the magic number in the file header of the document before trying to open the document with Word.

Binary Word documents (.doc) are compound documents and start with 0xcfd0, where as OpenXML documents (.docx) start with the string "PK".

static bool HasComoundDocumentSignature(string filename)
{
    using (BinaryReader br = new BinaryReader(File.Open(filename, FileMode.Open)))
    {
        UInt16 magicNumber = br.ReadUInt16();      
        return magicNumber == 0xcfd0;
    }
}

static bool HasZipSignature(string filename)
{
    using (BinaryReader br = new BinaryReader(File.Open(filename, FileMode.Open)))
    {
        UInt16 magicNumber = br.ReadUInt16();  
        return magicNumber == 0x4b50;
    }
}

static bool HasWordSignature(string filename)
{
    return HasCompoundDocumentSignature(filename) 
        || HasZipSignature(filename); 
}
0xA3
I know this is pretty close, but I didn't want to go with something like this as a solution because any zip file should be able to pass this test, so it's not necessarily indicative that the file is a Word file. I'm really looking for something in the API that will give me a clue about any conversions performed on the file when it was opened.
jcnnghm
@jcnnghm: Yes, sure this is not bulletproof. If you find a zip file you would have to open it and check for the _rels\.rels file, and parse the XML of that package part to see whether it contains a relationship of type "http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument".
0xA3