tags:

views:

415

answers:

1

Hi, I am looking for a way to indentify DOCX files if they are moved or renamed. Reason is obvious, I am playing with the Open XML SDK, building a hyperlink checker.

Works perfect, at least it can add or update hyperlinks in a document.

Problem is, though, if I rename an external file (source.docx + target.docx to targetB.docx) the link is broken. I can find broken links (by simply checking if the linked file is in its given place).

But I want more. I want to be able to recover this lost links, by searching for all documents in a directory (docx) and scanning if they are the "target". The most simple way should be a GUID stored somewhere in the document properties, which will not change if the document is renamed or edited (checksum is no applicable).

Then I create either a seperate list of links and according IDs, and if any document is renamed, I just update the link. I hope the concept is clear.

So there are a few basic questions:

  • Is there a "best practice" to store this "custom information" in an Open XML Document
  • Does a wordprocessingdocument (DOCX) already have some unique identifier created by Word
  • Where would you save the mapping (GUID of hyperlink target)

I hope the question is clear, if not I try to clarify, just comment if questions..

Thanks, Chris

A: 

Acrobat/PDF has something similar. Look up Bates numbering which is used to identify documents by putting in a unqiue number.

You should typically place this in the metadata section, if any. Or, add a custom part to the docx file that keeps the mapping (of course, remaining within the bounds of the spec). (I am not very familiar with the docx format, so you'll have figure this out.)

dirkgently