views:

104

answers:

4

My warehouse datamart data is split across 3 databases on the same server. This is Proof-Of-Concept project with three extracts that were loaded into individual databases.

While pulling into a cube, I am essentially doing this:

SELECT * FROM DB1.dbo.Fact_Pres
UNION
SELECT * FROM DB2.dbo.Fact_Pres
UNION
SELECT * FROM DB3.dbo.Fact_Pres

Should I actually consolidate the data into ONE table? Will that make my processing any faster?

I have no issues with disk space - I want to implement the best solution.

In either case, can you help me understand why the method you suggest would be optimal?

+3  A: 

Yes, you definitaly should. There is no point in splitting the same table in different databases. If you have problems with hard disk space, think about partitioning your table.

Regarding your comment:

The performance cost isn't THAT huge, but a union performs a merge join, which brings a bit ov an overhead.

In addition to that, are you sure you are using UNION correctly? UNION will eliminate duplicate values. Maybe what you really want to do is UNION ALL?

Maximilian Mayerl
Can you help me understand why the processing would be faster from one table rather than 3 tables that are in the UNION?
Raj More
UNION ALL is indeed what I want. Thank you!
Raj More
+3  A: 

Consider federating your table using SQL Server's first class partitioning (as opposed to doing it yourself). If you're always select every data point, then maybe yeah, pulling off multiple disks is faster.

But why have multiple databases? You could always stack the three tables into one table, but have that one table implemented atop three drives RAIDed together. This is a more clear cut solution if what you're after is speed.

The federating only makes sense if you're ever selecting specific, adjacent parts of the set. But according to your OP, you're selecting everything, so that eliminates that benefit.

Mark Canlas
I am doing partitioning in my cube where the reports run from.
Raj More
If you're doing reporting, why do you mention that you're UNIONing the tables together? When are you ever querying for every single data point?
Mark Canlas
I query everything from the fact tables in SQL Server and my SSAS cube holds all the aggregates required for reporting. I must be missing the thrust of your question, though. Please elaborate
Raj More
+1  A: 

A UNION essentially performs a select distinct so that it can remove duplicate records. This will (potentially) impair performance all by itself. You could rectify that by changing it to UNION ALL.

Beyond that, you should check the execution plan to see what kind of performance hit you're getting. I do know that SQL Server will use indexes even on tables in other databases, but what you're doing here still doesn't make a lot of sense. I'd suggest using table partitioning and keeping it all in one database if the normal usage scenario is to UNION all of the tables together, there is little if any good reason to separate them into different databases.

Donnie
I will change this to a `UNION ALL`.
Raj More
+2  A: 

The cross-database query is (somewhat) slower than than in-database, If you need three separate tables, it is advisable to use different schemas inside the same db. If this happens to be one fact table, then it is best to load into a a single fact table and use partitioning if the size happens to be too big.

When it comes to ETL, the suggestion is to ETL into staging tables in a separate schema -- like ETL -- within same DB and load the fact table from there. Once done, you truncate staging tables.

The recommendations are from the Microsoft Project Real.

Also keep in mind that foreign key can not be used across databases.

Damir Sudarevic
This is not a cross database JOIN - which I can understand would be slower. This is concatenating tables together. Since this is a warehouse, the ETL takes care of validating data so there are no FKs and no constraints.
Raj More
+1 for Project real (wish I could +2)
Raj More