views:

519

answers:

1

I have a web application that syncs Outlook contacts to a database (and back) via CDO. The DB contains every contact only once (at least theoretically, of course doublets happen), providing a single point of change for a contact, regardless of how many users have that particular contact in Outlook (like Interaction or similar products).

The sync process is not automatic, but user-initialized. An arbitrary timespan can pass before users decide to sync their contacts. A subset of these contacts may have been updated by other users in the meantime.

Generally, this runs fine, but I have never been able to solve this fundamental problem:

How do I doubtlessly identify a contact object in a mailbox?

  1. I can't rely on PR_ENTRYID, this property changes on contact move or mailbox move.
  2. I can't rely on my own IDs (e.g. DB table ID), because these get copied with the contact.
  3. I absolutely can't rely on fields like name or e-mail address, they are subject to changes and updates.

Currently I use a combination of 1 (preferred) and 2 (fall-back). But inevitably, sometimes users run into the problem of synching to the wrong contact because there is none with a given PR_ENTRYID, but two with the same DB ID, of which the wrong one is chosen.

There are a bunch of Outlook-synching products out there, so I guess the problem must be solvable.

+2  A: 

I had a similar problem to overcome with an internal outlook plugin that does contact syncing. I ended up sticking a database id in the Outlook object and referring to that when doing syncs.

The difference here is that our system has a bunch of duplicates that get resolved later by the users. When they get merged I'll remove the old records and update outlook with all of the new information along with a new id.

You could do fuzzy matching to identify duplicates, but duplicate resolution is a funny problem that's mostly trial and error. We've been successful at implementing "fuzzy" matching logic using the levenshtein distance algorithm for names and addresses cleaned down to a hash code.

Good luck, my syncing experiences have been somewhat painful.

Josh Bush
Oh man, the age of the question alone tells me that this is not a trivial problem. I stick the Database ID into the Contact as well, but that does not help when the contact is reused (same position, different guy), or when it gets copied to make a template for a new contact in the same firm.
Tomalak
Up voted anyway. At least I have your sympathy. :-D
Tomalak