I'm converting several large ETL processes from SSIS over to Rhino-Etl. I got fed up with the "grahical programming" nature of SSIS. Hunting through hundreds of dialog boxes to figure out what the ETL process is doing gets old in a hurry.
Several of the SSIS packages use the Fuzzy Match component for joining user accounts based on first name, last name, and DOB. The fuzzy match is necessary because the names are not always spelled the same across the two systems. There might be a legal name in one and a nick name in the other, typos, marriage name changes, etc.
The SSIS fuzzy lookup component does an OK job of handling these. It's not perfect, but it still greatly reduces the number of records that require a human's help.
Neither .Net nor Rhino-Etl have anything like this out of the box, so I'm looking for suggestions.
I have tried using the SoundEx t-sql function, but it doesn't produce very good results.
I'm currently considering using Levenshtein or possibly a Lucene.Net index.
Does anyone have any advice on either of those? Please feel free to suggest something completely different as well.