views:

338

answers:

8

I'm looking to refactor the below query to something more readable and modifiable. The first half is identical to the second, with the exception of the database queried from (table names are the same though.)

  SELECT
    Column 1 AS c1,
    ...
    Column N AS cN
  FROM
    database1.dbo.Table1

UNION

  SELECT
    'Some String' as c1,
    ...
    NULL as cN
  FROM
    database1.dbo.Table2

UNION

  SELECT
    Column 1 AS c1,
    ...
    Column N AS cN
  FROM
    database2.dbo.Table1

UNION

  SELECT
    'Some String' as c1,
    ...
    NULL as cN
  FROM
    database2.dbo.Table2

This query is the definition of DRY and is calling to me to be re-written, but I have no idea how!

EDIT: We can't use linq and we desire distinct results; I'm looking to make the query smaller in physical file size, not in results returned.

EDIT: The database I'm querying off is a proprietary ERP database. Restructuring it is not an option.

+2  A: 

One performance tip that I see off the bat is using UNION ALL instead of UNION unless you intentionally want distinct records. A simple UNION will eliminate duplicates which takes time. UNION ALL doesn't do that.

You could rewrite it with dynamic SQL and a loop but I think the result would be worse. If there is enough duplicate code to justify the dynamic sql approach, then I guess it could be justified.

Alternatively, have you considered moving the logic out of the stored procedure into something like LINQ? For many, this isn't an option so I'm just asking.

A final note: resist the urge to fix what's not broken just to make it look cleaner. If cleanup will aide in maintenance, verification, etc., then go for it.

Michael Haren
+3  A: 

I'm going to go out on a limb here, and say, based on the information you've given us;

That's as good as it's going to get

John MacIntyre
Is there anything else I can provide?
Gavin Miller
@LFSR-I was talking about the general assumptions we can make from the code. But no, I doubt more info would be much help. I betting it isn't worth it, but it almost looks like they have a business requirement, which the current datamodel doesn't fit. Refactoring the database 'may' be in order.
John MacIntyre
May be in order, but not likely (and not easily even imagined in a typical Enterprise system.)
le dorfier
@le dorfier-I wasn't expecting it would be.
John MacIntyre
I know, I wasn't disagreeing. Your answer is exactly right. I upvoted yours as well as adding my own comment.
le dorfier
+1  A: 

What's the problem? Too long? Too repetitive?

Sometimes you get ugly SQL - not much you can do about it.

I don't see any way to clean it up unless you want to use separate views and then union them together.

DJ
True--SQL is rarely pretty.
Michael Haren
Too repetitive. It's just the same thing 4 times over. And all of our sprocs look this way.
Gavin Miller
I don't know ... I can see the beauty in it. I've written some wickly awesome SQL .. some of it pages long ... and it's looked elegant to me.
John MacIntyre
+1  A: 

I vote for views, which impose near-enough zero overhead (OK, maybe a small compile-time cost but that ought to be all). Then your procs become something of the form

SELECT * FROM database1.view1
UNION
SELECT * FROM database1.view2
UNION
SELECT * FROM database2.view1
UNION
SELECT * FROM database2.view2

I'm not sure if I'd want to condense it any further, although I expect most platforms would tolerate it.

Mike Woodhouse
Which just increases complexity, unfortunately. If you change the statement, you end up needing to change it in 5 places, not 1.
le dorfier
Agreed - although there may be some scope, once we have extracted smaller, more "cohesive" elements (if we can use that term in relation to SQL) it may be possible to use some kind of code-generation to manage the views and bring the locus of change down to one place. Just speculation.
Mike Woodhouse
A: 

On the Dynamic SQL theme - here is a sample - not sure if it is any better. The benefit is you only have to write the SELECT list once.

DECLARE @Select1 varchar(1000)
DECLARE @Select2 varchar(1000)

DECLARE @SQL varchar(4000)


SET @Select1 = 'SELECT
    Column 1 AS c1,
    ...
    Column N AS cN'


SET @Select2 = 'SELECT
    ''Some String'' as c1,
    ...
    NULL as cN'


SET @SQL = @Select1 + ' FROM database1.dbo.Table1 '

SET @SQL = @SQL + ' UNION ' + @Select2 + ' FROM database1.dbo.Table2 '

SET @SQL = @SQL + ' UNION ' + @Select1 + ' FROM database2.dbo.Table1 '

SET @SQL = @SQL + ' UNION ' + @Select2 + ' FROM database2.dbo.Table2 '


EXEC @SQL
DJ
A: 

If all your procs look like this - you've probably got an architectural problem.

Are all your calls to table2 just have one useful field? (and because of UNION, end up having just one row?)

I totally second the idea of going with parameterized dynamic SQL and/or code-generation for this job, even going so far as to generate the column list dynamically using INFORMATION_SCHEMA. This isn't exactly what you need but it's a start (you might generate off a table of databases and tables):

DECLARE @template AS varchar(MAX)
SET @template = 'SELECT {@column_list} FROM {@database_name}.dbo.{@table_name}'
DECLARE @column_list AS varchar(MAX)

SELECT @column_list = COALESCE(@column_list + ',', '') + COLUMN_NAME
FROM database1.dbo.INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = @table_name
ORDER BY ORDINAL_POSITION

DECLARE @sql AS varchar(MAX)
SET @sql = @template
SET @sql = REPLACE(@sql, '{@column_list}', @column_list)
SET @sql = REPLACE(@sql, '{@database_name}', @database_name)
SET @sql = REPLACE(@sql, '{@table_name}', @table_name)
Cade Roux
It is likely a DB architectural problem. We use a proprietary ERP. Their database structure is poor on a good day, suicidal on a bad day.
Gavin Miller
If the query is at all expensive in optimisation terms, this approach will eliminate an possibility of a stored query plan. Might be a good thing, I don't know. Perhaps the code generation could occur outside the proc under control of an automated build process?
Mike Woodhouse
Right, you can always code-generate your SPs - treat the templates as the real code and anything which is code generated as simply an output of your development process.
Cade Roux
+2  A: 

This is a pretty standard SQL pattern. Sometimes it's easy to inadvisedly transfer OOP/Procedural code principles like DRY to SQL, but they aren't necessarily transferable concepts.

Note how easily you can grok the entire logical design of the query, vs. hunting through submodules. If one of the subexpressions had an extra column, or columns reversed, it would stick out. This is basically a pretty simple SQL statement to grok as an execution unit, where disaggregating it would muddle it.

And when you're debugging, it's handy to be able to use the editor's text highlighting option to selectively exercise parts of the statement - a technique that doesn't exist in procedural code. OTOH, it can get messy trying to trace down all the pieces if they are scattered into views etc. Even CTEs can make this inconvenient.

le dorfier
A: 

Depending on the number of rows returned, you may be best using UNION ALL on the selects with a select distinct query around it. I've seen a similar problem before and had different execution plans for the two different styles

SELECT DISTINCT subquery.c1, subquery.cN
FROM
(
SELECT Column 1 AS c1, Column N AS cN FROM database1.dbo.Table1
UNION ALL
SELECT 'Some String' as c1, NULL as cN FROM database1.dbo.Table2
UNION ALL
SELECT Column 1 AS c1, Column N AS cN FROM database2.dbo.Table1
UNION ALL
SELECT 'Some String' as c1, NULL as cN FROM database2.dbo.Table2
) subquery
John