I want to find possible candidate duplicate records in a large database matching on fields like COMPANYNAME and ADDRESSLINE1
Example:
For a record with the following COMPANYNAME:
- "Acme, Inc."
I would like for my query to spit out other records with these COMPANYNAME values as possible dups:
- "Acme Corporation"
- "Acme, Incorporated"
- "Acme"
I know how to do the joins, correlated subqueries, etc. to do the mechanics of pulling the set of data I want. And I know that has been covered on here before. I am interested hearing thoughts on the best way to do the fuzzy searching - should I use full-text indexing or the soundex function or something else that I am unware of for this process? (I am using SQL Server 2005)
Any help is appreciated!