tags:

views:

192

answers:

2

When data is copied over from source to destination in a SSIS package, source being a sql query with 'group by' keywords used and destination being a table, is it necessary that the data at a row position has to match the data at the same row position at the destination table??

sagar

+1  A: 

Its a performance question, really. Tables have no logical ordering. Or course the data does have a physical order on disk, and I/O has a significant effect on performance, so the best approach will depend on a) how the table is being populated (complete refresh vs. incremental update) and b) how the table is used downstream.

You could create a clustered index on the target table with the same columns as you have in the GROUP BY clause. This will physically order the data on disk by the keys of the clustered index.

If the target table is completely repopulated each time the package is run (drop-recreate or truncate), this may be a good design, since the incoming data will probably be in the right order.

If the target table is incrementally updated each time the package is run, this may be a bad design, since the database will have to interleave the incoming data with existing data on each insert, which can be quite expensive.

Peter
there is already a clustered index there; however i created non clustered indexes on the columns that are used in "group by" clause in the source query. So, i used the select query using order by clause using the columns in the group by clause.But still, the row-row does not match up. My main purpose of this question was to find an answer so that I can match up the source-destination row by row.
sagar
You don't need an index, clustered or non-clustered, for reconciling source to target. The non-clustered index will slow the table insert operation, so you may want to remove it if you only created it for this purpose. What you need to do is a) isolate the source and target data for a particular insert, and b) compare row for row. Can you isolate the data on both sides? Is the insert reproducible? Start with a count(*), and then move onto more detailed reconciliation.
Peter
+1  A: 

You can use a clustered index to force things to be stored in an ordered way, but as Peter notes this has a performance penalty for incremental updates.

Are you concerened about getting things out in order? That's an ORDER BY on your queries or perhaps you should create a standardised view that shows things in the order you want.

DaveE