views:

161

answers:

4

Hello.

I have a query with a long chain of CTEs which ends with

SELECT RegionName, AreaName, CityName, SubCityName, StreetName 
FROM tDictionaryStreets
UNION ALL
SELECT RegionName, AreaName, CityName, SubCityName, StreetName 
FROM tDictionaryRegions

The execution time of this query is 1450 ms. When I execute these 2 SELECTs separatly it takes much less time. For the query

SELECT RegionName, AreaName, CityName, SubCityName, StreetName 
FROM tDictionaryStreets

execution time is 106 ms. And for the query

SELECT RegionName, AreaName, CityName, SubCityName, StreetName 
FROM tDictionaryRegions

it's 20 ms.

Why UNION ALL increases the execution time in more than 10 times? What can I do to decrease it?

Thank you for your help.

UPDATED The whole query (I shortened it, but the problem still presents) is

WITH tFoundRegions AS
(
    SELECT KladrItemName FROM dbo.tBuiltKladrItemsWithQuants
    WHERE UserID = @UserID AND (indeces & 1) > 0
),
tFoundAreas AS
(
    SELECT KladrItemName FROM dbo.tBuiltKladrItemsWithQuants
    WHERE UserID = @UserID AND (indeces & 2) > 0
),
tFoundCities AS
(
    SELECT KladrItemName FROM dbo.tBuiltKladrItemsWithQuants
    WHERE UserID = @UserID AND (indeces & 4) > 0
),
tFoundSubCities AS
(
    SELECT KladrItemName FROM dbo.tBuiltKladrItemsWithQuants
    WHERE UserID = @UserID AND (indeces & 8) > 0
),
tFoundStreets AS
(
    SELECT KladrItemName FROM dbo.tBuiltKladrItemsWithQuants
    WHERE UserID = @UserID AND (indeces & 16) > 0
),
tDictionaryStreets AS
(
    SELECT DISTINCT
        CASE WHEN RegionName IN (SELECT KladrItemName FROM tFoundRegions) THEN RegionName ELSE NULL END RegionName
      , CASE WHEN AreaName IN (SELECT KladrItemName FROM tFoundAreas) THEN AreaName ELSE NULL END AreaName
      , CASE WHEN CityName IN (SELECT KladrItemName FROM tFoundCities) THEN CityName ELSE NULL END CityName
      , CASE WHEN SubCityName  IN (SELECT KladrItemName FROM tFoundSubCities) THEN SubCityName ELSE NULL END SubCityName
      , StreetName 
    FROM StreetNames
    WHERE StreetName IN (SELECT KladrItemName FROM tFoundStreets)
),
tMissingSubCities AS
(
    SELECT KladrItemName FROM tFoundSubCities
    WHERE KladrItemName NOT IN (SELECT SubCityName FROM tDictionaryStreets)
),
tDictionarySubCities AS
(
    SELECT DISTINCT 
        CASE WHEN RegionName IN (SELECT KladrItemName FROM tFoundRegions) THEN RegionName ELSE NULL END RegionName
      , CASE WHEN AreaName IN (SELECT KladrItemName FROM tFoundAreas) THEN AreaName ELSE NULL END AreaName
      , CASE WHEN CityName IN (SELECT KladrItemName FROM tFoundCities) THEN CityName ELSE NULL END CityName
      , SubCityName
      , NULL StreetName 
    FROM SubCityNames
    WHERE SubCityName IN (SELECT KladrItemName FROM tMissingSubCities)
)
SELECT RegionName, AreaName, CityName, SubCityName, StreetName 
FROM tDictionaryStreets
UNION ALL
SELECT RegionName, AreaName, CityName, SubCityName, StreetName 
FROM tDictionarySubCities
+2  A: 

Make sure you clear the execution + data caches between each test run.

e.g.

DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS

If you run with the UNION ALL first, and then run the 2 selects separately afterwards, the data will already be cached in memory making performance much better (therefore giving the false impression that the subsequent approach is quicker when it may not be).

If you used a UNION then that may well be slower as it has to apply a DISTINCT, but UNION ALL doesn't have to do that so it should be no different.

Update:
Have a look at the execution plans and compare them - see if there is any difference. You can view the execution plan by clicking the "Include Actual Execution Plan" button in SSMS before running the query

Update 2:
Based on full CTEs given, I think I'd be looking at optimising those - I don't think the UNION ALL is actually the problem.

IMHO, best thing to try is work through the CTEs one by one and try to optimise each one individually so that when you then combine them all in the main query, they perform better.

e.g. for tDictionaryStreets, how about trying this:

SELECT DISTINCT
    r.KladrItemName AS RegionName,
        a.KladrItemName AS AreaName,
        c.KladrItemName AS CityName,
        sc.KladrItemName AS SubCityName,
        s.StreetName      
FROM StreetNames s
    JOIN tFoundStreets fs ON s.StreetName = fs.KladrItemName
    LEFT JOIN tFoundRegions r ON s.RegionName = r.KladrItemName
    LEFT JOIN tFoundAreas a ON s.AreaName = a.KladrItemName
    LEFT JOIN tFoundCities c ON s.CityName = c.KladrItemName
    LEFT JOIN tFoundSubCities sc ON s.SubCityName = scc.KladrItemName

KladrItemName on each table should at least have an index on. Try reworking tDictionarySubCities in the same kind of way with joins too.

AdaTheDev
I tried execute the query a lot of times. The execution time changes insignificantly.
StuffHappens
Update added. Also, are you JUST running the 2 SELECTs when you're comparing performance or are you running the entire query you mention (with many CTEs)? i.e. could it be something else in the query as opposed to this specific UNION ALL part
AdaTheDev
The execution plan is huge. I can't read it at all... But there are difference: when query runs fast it uses 'Merge join' when slow - 'nested loops'. What does it mean? what can I do to make it use the same plan?
StuffHappens
Can you clarify in your question: are tDictionaryStreets and tDictionaryRegions CTEs? If so, can you include the full details of those. I don't think the UNION ALL is the cause of the apparent performance difference - we need to see the bigger picture.
AdaTheDev
I posted the whole query.
StuffHappens
A: 

Could you compare the execution plans? what is different? "Union all" should work fine, as there is no duplicate removal (this requires sorting, which is costly for large sets of data).

lmsasu
The execution plan is huge. When query runs fast it uses 'merge join', when slow - 'nested loops'. What does it mean?
StuffHappens
A: 

Have you tried flipping the selects to see if that many any difference

e.g.

SELECT RegionName, AreaName, CityName, SubCityName, StreetName  
FROM tDictionaryStreets 
UNION ALL 
SELECT RegionName, AreaName, CityName, SubCityName, StreetName  
FROM tDictionaryRegions

and

SELECT RegionName, AreaName, CityName, SubCityName, StreetName  
FROM tDictionaryRegions
UNION ALL
SELECT RegionName, AreaName, CityName, SubCityName, StreetName  
FROM tDictionaryStreets 

You could try Updating Statistics on both tables so see if that helps.
Also checking the execution plan might show up something.

kevchadders
I have. No changes.
StuffHappens
A: 

Could be network (unlikely) or memory. Depending on the number of rows each result set brings back. One way to check whether it is network or the server is to include client statistics in SSMS (Query - Include Client Statistics - SHIFT-ALT-S). At the bottom you can differentiate where the bulk of the time is spent.

Could you compare the execution plans? [...] lmsasu [...]When query runs fast it uses 'merge join', when slow - 'nested loops'.[...]

Can't comment yet but what you see in the execution plan is the difference between "joining" two result sets (merge join) and a RBAR (pronounce reebar - Row By Agonizing Row [Jeff Moden]) operation, commonly called a loop.

Merge Join: SQL finds two result sets with a common link and does a set based operation to bring the two sets together. Nested Loop: SQL can't find a common link and joins one row from set 1 to all rows from set 2 row by row and discards the ones that don't match.

Gut feeling is that SQL stumbles across the NULL results which are unknown results. Try assigning a value like "XYZ" (or anything known not to appear) that you can simply filter in the last query. This might avoid the Nested Loop in certain result sets as the values are determined and not unknown. Similar to:

[...]
tDictionarySubCities AS 
( 
    SELECT DISTINCT  
        CASE WHEN RegionName IN (SELECT KladrItemName FROM tFoundRegions) THEN RegionName ELSE 'XYZXYZ' END RegionName 
      , CASE WHEN AreaName IN (SELECT KladrItemName FROM tFoundAreas) THEN AreaName ELSE 'XYZXYZ' END AreaName 
      , CASE WHEN CityName IN (SELECT KladrItemName FROM tFoundCities) THEN CityName ELSE 'XYZXYZ' END CityName 
      , SubCityName 
      , NULL StreetName  
    FROM SubCityNames 
    WHERE SubCityName IN (SELECT KladrItemName FROM tMissingSubCities) 
) 
SELECT RegionName, AreaName, CityName, SubCityName, StreetName  
FROM tDictionaryStreets 
WHERE RegionName <> 'XYZ'
UNION ALL 
SELECT RegionName, AreaName, CityName, SubCityName, StreetName  
FROM tDictionarySubCities 
WHERE RegionName <> 'XYZ'
The result is less than 500 rows. No network is used.
StuffHappens