views:

24

answers:

2

I have a dataflow, where there is a DB source and a flat text file destination(delimited by pipe '|'). The DB source is picking up the SQL query from a variable.

Problem is that if my DB field size of say, firstname and lastname are 30 characters, i get the output as(space represented by dots)

saurabh......................|kumar.......................

What I need is the fields to be trimmed, so that the actual output is

saurabh|kumar

I have more than 40 columns to write, and I would not want to manually insert RTRIM after every column in my BIG sql query :(

I should add that the source can have upto 50,000 rows returned. I was thinking of putting a script component in between, but processing every row might have a performance impact.

Any ideas?

A: 

You could try using a script component in the data flow? Unlike the control flow, a data-flow script component has inputs & outputs.

Look at this example in MSDN: http://msdn.microsoft.com/en-us/library/ms345160.aspx

If you can iterate each column of the row (?) as it flows through the script component, you could do a .Net trim on the column's data, then pass the trimmed row to the output.

Advantage there of course is it will trim future rows you add later.

Just an idea, I haven't tried this myself. Do post back if it works.

Meff
i might have upto 50,000 rows, with about 40 columns. Processing each individually though script component might have a serious performance impact i guess :/
Saurabh Kumar
@Saurabh - I have used a script data-flow component for a different task, and performance was good. In that one, I took a delimited string and split it into columns, then sent each row to the output buffer. I'd say it's worth a go - 50k rows and 40 columns isn't that much,
Meff
A: 

You have quite a few options, but some will obviously be undesirable or impossible to do because of your situation.

First, I'll assume that trailing spaces in the data are because the data types for the source columns are CHAR or NCHAR. You can change the data types in the source database to VARCHAR or NVARCHAR. This probably isn't a good idea.

If the data types in the source data are VARCHAR or NVARCHAR and the trailing spaces are in the data, you can update the data to remove the trailing spaces. This is probably not appealing either.

So, you have SSIS and the best place to handle this is in the data flow. Unfortunately, you must develop a solution for each column that has the trailing spaces. I don't think you'll find a quick and simple "fix all columns" solution.

You can do the data trimming with a script transformation, but you must write the code to do the work. Or, you can use a Derived Column transformation component. In the Derived Column transformation you would add a derived column for each column that needs trimming. For example, you would have a firstname column and a lastname column. The derived column value would replace the existing column value.

In the Derived Column transformation you would use SSIS expression syntax to trim the data. The firstname and lastname trim expressions would be

RTRIM(firstname)
RTRIM(lastname)

Performance will probably be better for the Derived Column transformation, but it may not differ much from the script solution. However, the Derived Column transformation will probably be easier to read and understand later.

bobs

related questions