views:

166

answers:

3

I wan't sure how to word this question so I'll try and explain. I have a third-party database on SQL Server 2005. I have another SQL Server 2008, which I want to "publish" some of the data in the third-party database too. This database I shall then use as the back-end for a portal and reporting services - it shall be the data warehouse.

On the destination server I want store the data in different table structures to that in the third-party db. Some tables I want to denormalize and there are lots of columns that aren't necessary. I'll also need to add additional fields to some of the tables which I'll need to update based on data stored in the same rows. For example, there are varchar fields that contain info I'll want to populate other columns with. All of this should cleanse the data and make it easier to report on.

I can write the query(s) to get all the info I want in a particular destination table. However, I want to be able to keep it up-to-date with the source on the other server. It doesn't have to be updated immediately (although that would be good) but I'd like for it be updated perhaps every 10 minutes. There are 100's of thousands of rows of data but the changes to the data and addition of new rows etc. isn't huge.

I've had a look around but I'm still not sure the best way to achieve this. As far as I can tell replication won't do what I need. I could manually write the t-sql to do the updates perhaps using the Merge statement and then schedule it as a job with sql server agent. I've also been having a look at SSIS and that looks to be geared at the ETL kind of thing.

I'm just not sure what to use to achieve this and I was hoping to get some advice on how one should go about doing this kind-of thing? Any suggestions would be greatly appreciated.

A: 

Nick, I have gone the SSIS route myself. I have jobs that run every 15 minutes that are based in SSIS and do the exact thing you are trying to do. We have a huge relational database and then we wanted to do complicated reporting on top of it using a product called Tableau. We quickly discovered that our relational model wasn't really so hot for that so I built a cube over it with SSAS and that cube is updated and processed every 15 minutes. Yes SSIS does give the aura of being mainly for straight ETL jobs but I have found that it can be used for simple quick jobs like this as well.

ajdams
Thanks for the response - interesting to hear that you used SSIS for achieving a similar thing. How is the performance using SSIS? What kind of features of SSIS did you use - was it just some T-SQL code that you executed or did it involve using those data flow transformations?
Nick
It involves both actually. I use numerous data flow transformations to update dimensions and a few T-SQL components to clear raw data tables I use during the process as well. I haven't had any problems with performance and no one has complained about slow downs on the server when the job executes. Overall, we are very happy so far with SSIS.
ajdams
Performance sounds good then. And do you use a staging table for that too? I was reading around this subject and a staging table was suggested and the use of partitioning to then switch to the newly updated data.
Nick
Yes I have a staging table that I populate and then truncate at the end.
ajdams
+1  A: 

For that tables whose schemas/realtions are not changing, I would still strongly recommend Replication.

For the tables whose data and/or relations are changing significantly, then I would recommend that you develop a Service Broker implementation to handle that. The hi-level approach with service broker (SB) is:

Table-->Trigger-->SB.Service >====> SB.Queue-->StoredProc(activated)-->Table(s)

I would not recommend SSIS for this, unless you wanted to go to something like dialy exports/imports. It's fine for that kind of thing, but IMHO far too kludgey and cumbersome for either continuous or short-period incremental data distribution.

RBarryYoung
I do require changes to the schema and need to check data too so I don't think replication will suit in this instance. The third-party db doesn't have primary key's for each of the tables either so would rule out transactional replication.The service broker approach is a solution which could work although I'll be having to add triggers to multiple tables, and it'll take a bit more work to get a service broker solution working. I do share your concerns regarding the SSIS approach but I've heard of others updating their DW more frequently than this and assume that they are using SSIS for it
Nick
Well SSIS is not going to handle on-going schema changes well at all. This at least is something that Service Broker has a chance of handling (with DDL triggers as a source). I don't know of anything else that can, when the source and target DB schemas start out different.
RBarryYoung
A: 

I think, staging and partitioning will be too much for your case. I am implementing the same thing in SSIS now but with a frequency of 1 hour as I need to give some time for support activities. I am sure that using SSIS is a good way of doing it.

During the design, I had thought of another way to achieve custom replication, by customizing the Change Data Capture (CDC) process. This way you can get near real time replication, but is a tricky thing.

Faiz
I think CDC only came in version 2008, I'm stuck with 2005 for the source so haven't considered that.You say that staging and partitioning will be too much, you mean because of the frequency that I'd want to update?
Nick
Both the frequency and the delays it may cause. If you consider the cost and gain you may not go for it.
Faiz