views:

105

answers:

5

Ok, the basic situation: Due to a few mixed up starts, a project ends up with not one, but three separate databases, each containing a portion of the overall project data. All three databases are the same, it's just that, say 10% of the project was run into the first, then a new DB was made due to a code update and 15% of the project was run into the new one, then another code change required another new database for the rest of the project. Again, the pertinent tables are all exactly the same across all three databases.

Now, assume I wanted to take all three of those databases - bearing in mind that they can't just be compiled into a single databases due to Primary Key issues and so on - and run a single query that would look through all three of them, select a given set of data from each, then compile those three sets into one single result and return it to the reporting page I'm working on.

For reference, at its endpoint the data is output to an ASP.Net/VB.Net backed page, specifically a Gridview object. It doesn't need to be edited, fortunately, just displayed.

What would be the best way to approach this mess? I'm thinking that creating a temporary table would be my best bet, but honestly I'm stepping into a portion of SQL that I'm not familiar with here, and would appreciate any guidance somebody more experienced might have.

+3  A: 

Why not just use 3 part naming on the tables and union them all together?

select db1.dbo.Table1.Field1, 
       db1.dbo.Table1.Field2
from   db1.dbo.Table1
UNION 
select db2.dbo.Table1.Field1, 
       db2.dbo.Table1.Field2
from   db2.dbo.Table1
UNION 
select db3.dbo.Table1.Field1, 
       db3.dbo.Table1.Field2
from   db3.dbo.Table1
-- where  ...
-- order by ...
Scott Ivey
How do you suggest that updates be done after the data is in the application?
Raj More
I'll give this a shot first, thanks.
Clyde
@Raj, per the original question - "It doesn't need to be edited, fortunately, just displayed." If it did need to be updated though, i'd used a partitioned view with check constraints on the PK on the base tables.
Scott Ivey
+1  A: 

You can actually join tables on different databases. If I remember right the syntax is changed from "tablename.columnName" to "Server.Owner.tablename.columnName". You will need to run some stored procedures as an admin to allow this connectivity. It's also pretty slow but the effort to get it working is low.

If you have time to do it right look at data warehouse concepts. That's basically a temp table that collects the data you need to report on.

Jay
Data warehousing is another one that I'll look deeper into when I have time to do it right, thanks.Joins wouldn't work here - I'm not selecting additional related data from other tables, so much as I'm selecting the same data from the same table in another database.
Clyde
+1  A: 

You should create what is called a Partitioned View for each of your tables of interest. These views do a union of the underlying base tables and eventually add a syntetic column to uniquefy the rows:

  CREATE VIEW vTableXDB 
  AS
  SELECT 'DB1' as db_key, *
  FROM DB1.dbo.table
  UNION ALL
  SELECT 'DB2' as db_key, *
  FROM DB2.dbo.table
  UNION ALL
  SELECT 'DB3' as db_key, *
  FROM DB3.dbo.table;

You create one such view for each table and then design your reports on these views, not on the base tables. You must add the db_key to your join conditions. The query optimizzer has some understanding of the partitioned views and might be able to create plans that do the right thing and avoid joins that span multiple dbs, but that is not guaranteed. If things go haywire and the optimizer does not recognize the partitioning resulting in very bad execution times, you may have to move the db_key into the tables themselves and add some artificial check constraints on the base tables so that the optimizer can understand the partitioning (see the article I linked for details).

Remus Rusanu
This looks like something interesting for when I have time to do it right, rather than fast. Now, here's the problem - the query in question is already a mess due to other database decisions - as I said above, I'm being thrown at this on the backend. Because of the way the database is designed, I have to be able to do this through Dynamic SQL - if I recall, that doesn't work in a view. Would it still work as a stored procedure?
Clyde
Yes, it should work no matter how the query is created (dynamic sql, view, stored proc etc).
Remus Rusanu
A slightly modified version of this ended up being my final solution, as it allowed for an 'easy' way to auto select the correct database to pull further detail from for later queries that, fortunately, didn't need to collate data. Thanks gain!
Clyde
+5  A: 

I'd say your best bet is to suck it up and combine the databases, even if it is a major pain to combine the primary keys. It may be a major pain now, but it is going to be 10x as painful over the life of the project.

You can do a union across multiple databases as Scott has pointed out, but you are in for a world of trouble as the application gets more complex. For example, even if you circumvent the technical limitations by having multiple tables/databases for the same entity, having duplicates in the PK for a logical entity is a world of trouble.

Implement the workaround solution if you must, but I guarantee you will hate yourself for it later.

JohnFx
Fortunately, the project is already done - I'm being thrown at this at the backend of the mess. Unfortunately, as much as I'd rather deal with updating the keys myself, I can't join the databases because they're a chain of custody store. Officially the project is done, and absolutely no major modifications are allowed.
Clyde
Don't worry. Eventually you'll get another shot at it. Apps like this are notorious for whipping back around on ya. =)
JohnFx
A: 

Building on Scott Ivey's excellent example above,

  1. Use table name aliasing to simplify your code
  2. Use UNION ALL instead of UNION assuming that your data is unique between the three databases

Code:

select
    d1t1.Field1, 
    d1t1.Field2
from db1.dbo.Table1 AS d1t1
UNION ALL
select
    d2t1.Field1, 
    d2.Field2
from db2.dbo.Table1 AS d2t1
UNION ALL
select
    d3t1.Field1, 
    d3t1.Field2
from db3.dbo.Table1 AS d3t1
-- where  ...
-- order by ...
Rob Garrison