views:

61

answers:

3

I've got a table ItemValue full of data on a SQL 2005 Server running in 2000 compatibility mode that looks something like (it's a User-Defined values table):

ID    ItemCode     FieldID   Value
--    ----------   -------   ------
 1    abc123             1   D
 2    abc123             2   287.23
 4    xyz789             1   A
 5    xyz789             2   3782.23
 6    xyz789             3   23
 7    mno456             1   W
 9    mno456             3   45
                                 ... and so on.

FieldID comes from the ItemField table:

ID   FieldNumber   DataFormatID   Description   ...
--   -----------   ------------   -----------
 1             1              1   Weight class
 2             2              4   Cost
 3             3              3   Another made up description
 .             .              x   xxx
 .             .              x   xxx
 .             .              x   xxx
 x             91  (we have 91 user-defined fields)

Because I can't PIVOT in 2000 mode, we're stuck building an ugly query using CASEs and GROUP BY to get the data to look how it should for some legacy apps, which is:

ItemNumber   Field1   Field2    Field3 .... Field51
----------   ------   -------   ------
    abc123   D        287.23    NULL
    xyz789   A        3782.23   23
    mno456   W        NULL      45

You can see we only need this table to show values up to the 51st UDF. Here's the query:

SELECT
    iv.ItemNumber,
    ,MAX(CASE WHEN f.FieldNumber = 1 THEN iv.[Value] ELSE NULL END) [Field1]
    ,MAX(CASE WHEN f.FieldNumber = 2 THEN iv.[Value] ELSE NULL END) [Field2]
    ,MAX(CASE WHEN f.FieldNumber = 3 THEN iv.[Value] ELSE NULL END) [Field3]
        ...
    ,MAX(CASE WHEN f.FieldNumber = 51 THEN iv.[Value] ELSE NULL END) [Field51]
FROM ItemField f
LEFT JOIN ItemValue iv ON f.ID = iv.FieldID
WHERE f.FieldNumber <= 51
GROUP BY iv.ItemNumber

When the FieldNumber constraint is <= 51, the execute plan goes something like:

SELECT <== Computer Scalar <== Stream Aggregate <== Sort (Cost: 70%) <== Hash Match <== (Clustered Index Seek && Table Scan)

and it's fast! I can pull back 100,000+ records in about a second, which suits our needs.

However, if we had more UDFs and I change the constraint to anything above 66 (yes, I tested them one by one) or if I remove it completely, I lose the Sort in the Execution plan, and it gets replaced with a whole bunch of Parallelism blocks that gather, repartition, and distribute streams, and the entire thing is slow (30 seconds for even just 1 record).

FieldNumber has a clustered, unique index, and is part of composite primary key with the ID column (non-clustered index) in the ItemField table. The ItemValue table's ID and ItemNumber columns make a PK, and there is an extra non-clustered index on the ItemNumber column.

What is the reasoning behind this? Why does changing my simple integer constraint change the entire execution plan?

And if you're up to it... what would you do differently? There's a SQL upgrade planned for a couple months from now but I need to get this problem fixed before that.

+3  A: 

SQL Server is smart enough to take CHECK constraints into account when optimizing the queries.

Your f.FieldNumber <= 51 is optimized out and the optimizer sees that the whole two tables should be joined (which is best done with a HASH JOIN).

If you don't have the constraint, the engine needs to check the condition and most probably uses index traversal to do this. This may be slower.

Could please post the whole plans for the queries? Just run SET SHOWPLAN_TEXT ON and then the queries.

Update:

What is the reasoning behind this? Why does changing my simple integer constraint change the entire execution plan?

If by a constraint you mean the WHERE condition, this is probably the other thing.

Set operations (that's what SQL does) have no single most efficient algorithm: efficiency of each algorithm depends heavily on the data distribution in the sets.

Say, for taking a subset (that's what the WHERE clause does) you can either find the range of record in the index and use the index record pointers to locate the data rows in the table, or just scan all records in the table and filter them using the WHERE condition.

Efficiency of the former operation is m × const, that of the latter is n, where m is the number of record satisfying the condition, n is the total number of records in the table and const > 1.

This means that for larger values of m the fullscan is more efficient.

SQL Server is aware of that and changes execution plans accordingly to the constants that affect the data distribution in the set operations.

TO do this, SQL Server maintains statistics: aggregated histograms of the data distribution in each indexed column and uses them to build the query plans.

So changing the integer in the WHERE condition in fact affects the size and the data distribution of the underlying sets and makes SQL Server to reconsider the algorithms best fit to work with the sets of that size and layout.

Quassnoi
I don't see any mention of a CHECK constraint in OP...
Remus Rusanu
I will post the plans, but I have to make a couple changes so my real tables don't give any identifying information (my example is doctored).
Cory Larson
@Remus: When I read it I believed that *when the FieldNumber constraint is `<= 51`* was about a `CHECK` constraint, but you're probably right, the @op much have meant the `WHERE` condition.
Quassnoi
I'm accepting this answer for your description and for answering my question, though I'll be taking into account some of the other answers as they also provide some insight.Thanks!
Cory Larson
A: 

it gets replaced with a whole bunch of Parallelism blocks

Try this:

SELECT
    iv.ItemNumber,
    ,MAX(CASE WHEN f.FieldNumber = 1 THEN iv.[Value] ELSE NULL END) [Field1]
    ,MAX(CASE WHEN f.FieldNumber = 2 THEN iv.[Value] ELSE NULL END) [Field2]
    ,MAX(CASE WHEN f.FieldNumber = 3 THEN iv.[Value] ELSE NULL END) [Field3]
        ...
    ,MAX(CASE WHEN f.FieldNumber = 51 THEN iv.[Value] ELSE NULL END) [Field51]
FROM ItemField f
LEFT JOIN ItemValue iv ON f.ID = iv.FieldID
WHERE f.FieldNumber <= 51
GROUP BY iv.ItemNumber
OPTION (Maxdop 1)

By using Option(Maxdop 1), this should prevent the parellelism in the execution plan.

G Mastros
So this actually works, but you can't put OPTION in a view as I need it.
Cory Larson
A: 

At 66 you are hitting some internal cost estimate threshold that decides it is better to use one plan vs. the other. What that threshold is and why it happens is not really important. Note that your query differ with each FieldNumber value, as you are not only changing the WHERE: you also change the pseudo-'pivot' projected fields.

Now I don't know all the details of your table and your queries and insert/update/delete/pattern, but for the particular query you posted the proper clustered index structure for the ItemValue table is this:

CREATE CLUSTERED INDEX  [cdxItemValue] ON ItemValue (FieldID, ItemNumber);

This structure eliminate the need to intermediate sort the results for this 'pivot' query.

Remus Rusanu