Let me first say that being able to take 17 million records from a flat file, pushing to a DB on a remote box and having it take 7 minutes is amazing. SSIS truly is fantastic. But now that I have that data up there, how do I remove duplicates?
Better yet, I want to take the flat file, remove the duplicates from the flat file and put them back into another flat file.
I am thinking about a:
- File source (with an associated file connection)
- A for loop container
- A script container that contains some logic to tell if another row exists
Data Flow Task
Thak you, and everyone on this site is incredibly knowledgeable.
Update: I have found this link, might help in answering this question