views:

295

answers:

4

Hi,

I have a server (SQL Server 2005) with multiple archive databases (1 per quarter stretching back 8 years) that are all structurally identical.

I frequently need to query back over a certain date range that spans n databases, usually n is small 1-3 but it's possible I need to query the whole set.

Any thoughts n the most efficient way to do this both from a code cleanliness and a performance perspective?

Current solutions are rather ad-hoc, there are a collection of views that span all or just the most recent database, other solutions are to generate dynamic SQL that figures out which DB(s) contain the data sought.

Obviously the ideal solution would be to partition the tables but I can't do this because it's a 3rd party supplied database

Dave

EDIT: I can't combine the databases since they are 3rd party controlled, the total data size is about 50GB so not huge, the largest tables contain around 1.5m rows per quarter

EDIT2: A data warehouse is definitely the right solution long term (it's in the plan) but I can't do this today :(

A: 

Depending on the size of the databases, it might actually be better to consolidate them into one database and properly index them.

You can write your own SSIS package and schedule it to consolidate data periodically (daily / hourly / etc).

Raj More
+2  A: 

Here's something that's going to do it!

Declare
@Database varchar(8000),
@Sql varchar(8000)
BEGIN Declare DBName Cursor LOCAL FAST_FORWARD For Select name FROM sys.databases where name like 'Your_DB_Names%'

Open DBName WHILE (1=1) Begin Fetch Next From DBName into @Database

if @@Fetch_status = -1 Break
if @@Fetch_status = -2 Continue

Set @Sql = 'use '+@Database Print @Sql Execute (@Sql)

SELECT * FROM TABLE -- Your query here End Close DBName
Deallocate DBName END

Danielle
SHEESH!! Don't start using darn slow **CURSORS** just for that!! Really bad idea.....
marc_s
Arguably the cursor is only used to get the db name, not to iterate through the resultset of the actual SQL query.
Chris Kaminski
The cost of the cursor here would be minimal compared to yanking 30m rows out of the database
David Hayes
sp_msForEachDb uses what I would call a very ugly cursor...
Philip Kelley
+2  A: 

I've done this often, and let me tell you, keeping separate databases is a PAIN IN THE ASS. It forces you to do all sorts of logic like this all over the place - it sort of breaks the encapsulation that is a database in the first place.

What you are looking at is a data warehouse. You should look into consolidating all of your databases into one, and making it read-only. Then you take nightly/hourly incremental backups of your live data, and restore that against your warehouse. Then your warehouse is always up to date, and you run your reporting against that, instead of the live data.

This has the upshot of keeping your reports from killing your live production databases, and I'd guess upwards of 90% of business needs don't require 100% accurate just-in-time numbers.

Do the hard stuff once - create a warehouse. :-)

EDIT

Something I've done in the past is to create a view of the tables I use, and using linked databases (if the dbs were on other machines)

Create view view_tale as 
    select * from activedb.dbo.table
    union
    select * from db1.dbo.table 
    union 
    select * from db2.dbo.table

Hideous, performance-wise, but solves the problem neatly. Then you still only have the one-time setup issue (creating a view per table you wsish to query), and a centralized place to modify to keep your database list up to date for ongoing maintenance, as opposed to keeping N number of reports up to date.

Chris Kaminski
A data warehouse is definitely in the plan for the future (maybe next year if I get my way) but I still have my short term problem
David Hayes
The view approach is what we use now but I'm suffering from view proliferation where I have one that goes back 2 Quarters, one that only looks at the latest version of the order ... It works but it's a pain, the joy of inheriting another person's 'design'
David Hayes
Then probably the best thing for you to do is to use a scripting tool of some sort to rebuild your views every time you create a database. You could conceivably do that by running a cursor through master..sysdatabases grabbing the db names, and building your view dynamically. It is a pain, but it beats having to custom write every report. I'd ideally even put the views in another db, if you can get away with it. Keeps your proliferation to a minimum in your working database, and still lets you get a unified view of your data.
Chris Kaminski
+5  A: 

One way to do this: use sp_msForEachDb.

-- Round 1 -------

Call this system procedure with a varchar parameter. (It's actually a LOT messier than this, check the code in the master database if you want to know what it's really doing.) The parameter must be a chunk of dynamic code -- for example,

DECLARE @DemoParameter varchar(1000)
SET @DemoParameter = 'SELECT MyCol from MyTable where CreatedOn between ''Jan 1, 1980'' and ''Dec 21, 2012'''
EXECUTE sp_msForEachDb @DemoParameter

This would run the query against every database on the SQL instance, returning one set per database -- except for those databases that didn't have the necessary table(s), which would throw an error (particularly the system databases). This leads us to...

-- Round 2 ---------

Within the dynamic code, as databases are iterated over all instances of the question mark ? will be replaced with the name of the currently being processed database. You can use this to filter which databases are to be processed, and which aren't. Note also that the "current" database will not be changed by the routine, you have to do that yourself. This gives us code like:

SET @DemoParameter = '

IF ''?'' like  ''%Foo%''
 BEGIN
    USE ?
    SELECT MyCol from MyTable where CreatedOn between ''Jan 1, 1980'' and ''Dec 21, 2012''

'

This would run the query only against those databases whose names contain the characters "foo". Possibly you could check for the presence of the table within each database; other methods suggest themselves.

This will shotgun back one dataset for each database, which doesn't help too much if you need them all in one neat and orderly data set, and that gets us to...

-- Round 3 ------------

Briefly: create a temp table, and populate it from within the dynamic query. As I show below, you can include the name of the database, and ever the server name -- very useful when your questing for lost data across dozens of databases spread across a handful of servers.

Create (or clear) the temp table:

IF object_id('tempdb.dbo.##Foo') is null
    CREATE TABLE ##Foo
     (
       ServerName         varchar(100)  not null
      ,DBName             varchar(100)  not null

      --  Add your own columns here
      ,MyCol              int  not null
     )

ELSE
    --Option: Delete this line to not clear on each run
    TRUNCATE TABLE ##Foo

Run the code (this is my main template, you can easily work @DemoParameter back in there):

EXECUTE sp_msForEachDB '
IF ''?'' like  ''%Foo%''
 BEGIN
    USE ?

    INSERT ##Foo
     select @@servername, db_name()
       ,MyCol
      from MyTable
 END
'

...and that should produce a single temp table with your data. Test this out, I wrote this without actually testing the code, and typso will silp in. (#temp tables should work as well as ##temp, I generally do this with ad-hoc system support issues)

Philip Kelley
By the by, I generally agree with everyone else who says "combine your data". Very easy to say, but very hard to do. So long as you recognized that routines like mine are stopgaps or work-arounds, and you learn from and never architect databse systems like this yourself, you should be good.
Philip Kelley