tags:

views:

520

answers:

4

How can I check a word template file. It can be checked using the extension .dot or .dotx. But if the user changed a .txt to .dot. How could it identified??

A: 

check the File signature, .dot file should start with D0 CF .....

Priyank Bolia
Could you please expand your answer.
Sauron
A: 

A .dotx file is just a ZIP file, so you can check if it starts with "PK" (zip header), then you'll need to completely decompress the file and check if the contents are a valid Word template.

Chris Thompson
Can you give some code of it?
Sauron
+1  A: 

To check for a dotx file (which is actually a zip file), check the header:

0000000: 504b 0304 1400 0000 0800 95a1 3435 4a07  PK..........45J.

First four bytes are 0x050 0x4b 0x03 0x04. This will demonstrate if a file is a zip file (so not necessarily a dotx), if you wanted to checked further your would need to unzip the entire buffer and parse the resulting XML.

To check for a dot file (pre 2007), check the header:

0000000: d0cf 11e0 a1b1 1ae1 0000 0000 0000 0000  ................

First eight bytes 0xd0 0xcf 0x11 0xe0 0xa1 0xb1 0x1a 0xe1

So for either of these cases, open the file in binary mode, read the first eight bytes and compare.

Cannonade
It makes sense to use the whole signature, which is 4 bytes for zips, and 8 for dot.
Matthew Flaschen
Thanks Matthew, I updated to reflect your suggestion.
Cannonade
+1  A: 

According to http://www.garykessler.net/library/file_sigs.html, the full signature of a dot file (among others) is:

D0 CF 11 E0 A1 B1 1A E1

So, below is some code to start with. It works for .dot, but if you want to check .dotx, you can implement similar code. This does not absolutely guarantee it's a valid dot, so you still need to handle errors reasonably later.

// Use this as a class field.
private static readonly byte[] DOT_SIGNATURE = new byte[]{0xD0, 0xCF, 0x11, 0xE0, 0xA1, 0xB1, 0x1A, 0xE1};

Later, when you actually have the Stream:

bool isDot = true;

Stream dotStream = ...
byte[] firstBytes = new byte[DOT_SIGNATURE.Length];
int totalRead = 0, curRead;

while(totalRead < DOT_SIGNATURE.Length)
{
    curRead = dotStream.Read(firstBytes, totalRead, DOT_SIGNATURE.Length - totalRead);
    if(curRead == 0)
    {
            isDot = false;                
            break; // Premature end of stream;
    }

    totalRead += curRead;
}

if(isDot)
{
    for(int i = 0; isDot && i < DOT_SIGNATURE.Length; i++)
    {
            // If isDot becomes false, arrays are not equal and we break out.
            isDot = (firstBytes[i] == DOT_SIGNATURE[i]);
    }
}

dotStream.Seek(0, SeekOrigin.Begin);
Matthew Flaschen
Can you give me the signature of a dotx file?
Sauron
Since dotx files are zip files, the signature is 50 4B 03 04 (see http://members.tripod.com/~petlibrary/ZIP.HTM). However, obviously not all zip files are dotx.
Matthew Flaschen