tags:

views:

211

answers:

1

Hi,

I have litte knowledge in data transformation in SSIS and basically I am learning all by myself.

II have learned some of them and now I am into Fuzzy logic.

I am getting stuck in Fuzzy grouping and Fuzzy Lookup in SSIS.

I cannot figure out how to do that though some google search gave me some result which are beyond my capability .

Could any one please suggest me some step by step tutorial for implementing the same .

It would be great if the example contains diagrams so that I can easily learn.

Also in which case should I go with it(I mean a real time scenario)

Thanks in advance

+1  A: 

Here is a good start for you to actually understand what the fuzzy lookup component does (Similar to the fuzzy grouping) : SSIS fuzzy lookup

I actually used this at a client where I was receiving their client data that was fat fingered in by someone. I created a static lookup table based on company names:

Lku Table (notice how these are the same at the beginning)

Name | Lookup Output Name

Microsoft | Microsoft

JP Morgan Chase | JP Morgan Chase

McDonalds | McDonalds

I would receive data in a text file that looked like this:

Typed Name

Microsft

JP Morgan

McDons

Using the fuzzy lookup, I would join on the Name column(dont forget this is case sensitive-user upper or lower to cast) to get the lookup ouput name. I set the similiarity threshold to about 80% (recommended percent or higher). I would then view my matchups via the data viewer which might look like this:

Typed Name | Lookup Name | Confidence | Similarity

Microsoft | Microsoft | 100% | 100%

JP Morgan | JP Morgan Chase | 88% | 90%

McDons | McDonalds | 60% | 50%

Then based on a conditioal split, I loaded the ones with both a confidence and similiarity percent > 80% and less then < 100% into the lookup table and loaded the others into an error table. An email was then emailed if the count was greater then one in the error table. So the result lookup table would be something like this:

Look Up Table

Name | Lookup Output Name

Microsoft | Microsoft

JP Morgan Chase | JP Morgan Chase

McDonalds | McDonalds

JP Morgan | JP Morgan Chase


Error Table

Name | Proposed Name | Error message

McDons | McDonalds | Confidence was 60% and Similarity was 50%

Hope this helped.

rfonn

related questions