tags:

views:

193

answers:

5

The workflow is like this:

  1. I receive a scan of a coupon with data (firstname, lastname, zip, city + misc information) on it.
  2. Before I create a new customer, I have to search the database if the customer might exist already.

Now my question: What's the best way to find an existing customer, when there is no unique ID available?

PS: I do have a unique ID in the database, just not on the coupons we receive ;)

+2  A: 

See this previous question: http://stackoverflow.com/questions/16413/parse-usable-street-address-city-state-zip-from-a-string#16444.

Soundex would help you if you require similiar matches.

Mitch Wheat
A: 

You query the database for all customers that match the given data, e.g.

SELECT ID FROM tbl_customers WHERE 
   first_name LIKE 'JOHN' 
   AND last_name LIKE 'Doe' 
   AND zip_code=12345 
   AND city LIKE 'Ducktown'

If the number of rows returned is 0, create a new entry in the database. If it is 1, the query will give you the ID. If it is > 1 you may have several customers of the same name living in the same area, need to find a way to deal with this situation. But that would justify a new question here ;-)


p.s.: If you have no unique ID at all, redesign your database.

Treb
+3  A: 

We are using the Levenshtein distance algorithm to check users for duplication. However we have quite strict rules to enter the data itself, so we have to check only for misstyping, case differences and such.

Biri
A: 

If you have SQL Server 2005, you can bring your data in through SSIS and use a fuzzy lookup to check for sameness.

HLGEM
+2  A: 

If you really want to do this the right way, the easy way, the complete way you'll buy Netrics.

http://www.netrics.com/

We bought it, and wrapped an application around it that lets our employees match anything they want. The can configure confidence intervals for each column, build thesauri where you can map Robert to Bob, and John to Jack. It's amazing and used by some of the larger institutions in the country for scrubing various lists.