views:

140

answers:

1

I'm working on a little experimental utility to use within our company that indexes notes stored in our custom CRM software for full-text searching. These notes are stored in a Btrieve database (a file called NOTES.DAT). It's possible to connect to the database and retrieve the notes for indexing by using Pervasive's ADO.NET provider. However, the indexer currently loops through each note and re-indexes it every 5 minutes. This seems grossly inefficient.

Unfortunately, there's no way for our CRM software to signal to the indexing service that a note has been changed, because it's possible for the database to exist on a remote machine (and the developers aren't going to write a procedure to communicate with my service over a network, since it's just a hobby project for now).

Rather than give up, I'd like to take this opportunity to learn a little more about raw Btrieve databases. So, here's my plan...

The NOTES.DAT file has to be shared, since our CRM software uses the Btrieve API rather than the ODBC driver (which means client installations have to be able to see the file itself on the network). I would like to monitor this file (using something like FileSystemWatcher?) and then determine the bytes that were changed. Using that information, I'll try to calculate the record at that position and get its primary key. Then the indexer will update only that record using Pervasive's ADO.NET provider.

The problem (besides the fact that I don't quite know the structure of Btrieve files yet or if determining the primary key from the raw data is possible) is that I don't know how to determine the start and end range of bytes that were changed in NOTES.DAT.

I could diff two versions, but that would mean storing a copy of NOTES.DAT somewhere (and it can be quite large, hence the reason for a full-text indexing service).

What's the most efficient way to do this?

Thanks!

EDIT: It's possible for more than one note to be added, edited, or deleted in one transaction, so if possible, the method needs to be able to determine multiple separate byte ranges.

+1  A: 

If your NOTES.DAT file is stored on an NTFS partition, then you should be able to perform one of the following:

  • use the USN journal to identify changes to your file (preferred)
  • use the volume shadow copy service to track changes to your file by taking periodic snapshots through VSS (very fast), and then either:
    • diffing versions N and N-1 (probably not as slow as reindexing, but still slow), or
    • delving deeper and attempting to do diff the $Mft to determine which blocks changed at which offsets for the file(s) of interest (much more complex, but also much faster - yet still not as fast, reliable and simple as using the USN journal)

Using the USN journal should be your preferred method. You can use the FSUTIL utility to create and truncate the USN journal.

Cheers, V.

vladr