views:

126

answers:

1

I have 700k-1M rows coming in with a comma delimited field (among others). I need to keep this column intact, but sort the rows so that I can determine duplicates. In this business case, a,3,null,40 is the same as null,40,a,3 and so on.

This can be achieved via stored proc, script component, tsql.

I found this component which does exactly what I need, I just don't have the knowledge for how to pull the script out or how to compile it into a component!

http://sortdedupdelimstring.codeplex.com/

Any help is very much appreciated.

Please let me know if I was unclear anywhere.

+2  A: 

The project is a Visual Studio 2005 Visual Basic project. It is for SQL Server 2005.

Load the project into VS and compile (F7), it will compile and install the component into SSIS for you. Create an SSIS project, click the Data Flow tab, and right-click anywhere on the Toolbox, select 'Choose Items...'. Go to the 'SSIS Data Flow Items' tab, locate the 'SortDeDuplicateDelimitedString' component and check the box. Now the component will appear in the 'Data Flow Transformations' list, and you can use it like you would any other component.

Marc Bernier
I don't have visual studio. Only BIDS.
elgabito
I just looked at the .zip file again, the DLL is already built for you. Look in the bin directory, copy the DLLs into /Program Files/Microsoft SQL Server/90/DTS/PipelineComponents/, restart BIDS and follow the steps above.
Marc Bernier
I'll take Marc's word for it that this works - ended up doing something different because Windows 7 won't led you add files to the assembly directory and I have yet to see someone clarify how it's done on Windows 7.
elgabito
You probably need to use gacutil.exe to register the file in the global assembly cache. Sorry, I forgot all about this step, my post-build does it automatically for me:gacutil /if SortDeDuplicateDelimitedString.dll(you'll need to be logged in as admin)
Marc Bernier