I am a system administrator for a company that supports a CRM CMS (Salesforce).
I don't like the built-in functionality so I generally manage data through CSV/XLS files for uploading and downloading, because I can write better queries and the like. One of my tasks is uploading Contacts and Accounts. Because of a lack of unique Identifiers and misspellings, I'm encountering duplicates in uploaded records.
- When uploading contacts, I try to match by email, phone number and/or lastname/firstname
- With companies I am generally limited to just the name of the account, with numerous possible abbreviations and misspellings
I'm looking for a better method to check for duplicates with the following constraints:
Given a list of names, emails or phone numbers (all stored as text fields), do a comparison check between two tables looking for the best match out of the second table. Preferably between multiple fields, but even if it's just one, it will greatly assist my endeavor.
The size of the dataset is about 17,000 records on the longest table. Values are generally entered about 50 at a time. My limitations are the hardware I have and no budgetary discretion. My programming ability is pretty basic, but I can learn and have Eclipse & Visual Studio on the system.
Does anyone have a suggestion as how I can either solve this problem programmatically, or with a third party tool?