views:

1081

answers:

5

What's the best way to strip out characters from flat files in SSIS? In my case, I need to remove all quotes from the file before processing.

EDIT:
How can I run an executable against some files from SSIS? Can I somehow use the source connection as an input or would I have to pass in the file names as parameters?

A: 

If I understand your question correctly, you would like to remove any quotes from any of the column values that are in your text file? If this is the case, you would use a derived column transformation. You would select Replace "column_name" in the Derived Column Name drop down. You would then populate the expression property with the following code: REPLACE( "\"", [column_name] , "")

Hope this helps.

rfonn
I need to remove the quotes from the entire file before it is processed. Otherwise, the columns are getting screwed up.
Even Mien
If this is the case, I would then use a script task in your control flow to do any necessary replacing of "". Then, you can use your data flow to do all your ETL work. I guess I missed the whole before processing part of your question.
rfonn
+1  A: 

The easiest way to do this would be to create a "Transformation" script component and use code to strip the quotes.

unclepaul84
This method would work, too as long as you are familiar with .NET (VB only for SSIS 2005 or VB or C# for SSIS 2008) and are willing to write a little more code then what I have suggested below.
rfonn
Can this be run pre-processing on the file? I need to remove the quotes from the file beforehand, so that the columns can be imported correctly.
Even Mien
You could use the script task in the control flow before you get to the dataflow that loads the file. The only issue is you have to preprocess one "sample" file before you create the dataflow so that you can "prime" the correct colums to the flat file source.
unclepaul84
A: 

Both unclepaul84 and Ryan Fonnetts' solutions would work but personally I go more towards unclepaul's because I found that I know have multiple files that need quotes stripped out and I can use the same transformation code for every one (which is nice).

ajdams
Good point- if you have more then 4 or 5 columns, you probably could create a reusable 'replace' function which may save some time in your script component when developing your package. However, you can replace one to many of the columns using the same exrpession in the same derived column transform. So, I guess it depends on how many columns have the quote character issue and how comfortable you are writing .NET.
rfonn
+1  A: 

I did using Derived Column Transformation

e.g. If I want to Replace ', " " , # in ssis using derived column, I would write

Replace(Replace(REPLACE(name,"$"," "), "'", " ")," ","")

But I feel that, if 1 or 2 columns needs to be filtered out, then this approach is good. If it involves more, then go ahead with Script task

deeps_rule
+1  A: 

Since it's something that you have to do for all the fields on your files, I'd recommend doing it as a first step of the process and not as an operation in the transformation workflow.

You can code your own .NET script and embed it on a Script task. You can also call a third party tool or component via an Execute Process Task.

For instance if you have access to cygwin unix command-line, something like this should do the work:

sed s/\"//g data1.txt

You can call an executable via the mentioned "Execute process task" component, and you can parametrize its inputs by setting expressions on the component's attributes. Those expressions can be based on input variables that might be configured via configuration files. (this is just one of the many ways that SSIS provides to achieve something like this)

river0