views:

112

answers:

5

I have two queries that I'm UNIONing together such that I already know there will be no duplicate elements between the two queries. Therefore, UNION and UNION ALL will produce the same results.

Which one should I use?

+11  A: 

You should use the one that matches the intent of what you are looking for. If you want to ensure that there are no duplicates use UNION, otherwise use UNION ALL. Just because your data will produce the same results right now doesn't mean that it always will.

That said, UNION ALL will be faster on any sane database implementation, see the articles below for examples. But typically, they are the same except that UNION an extra step to remove identical rows (as one might expect), and it may tend to dominate execution time.

Daniel DiPaolo
Would you like some syrup with that waffle?
Mark Ransom
Given that the data is pulled from two unrelated tables which contain GUIDs, I know there is never going to be duplicate content between the two queries.
Billy ONeal
@Mark - It's not waffling, it's answering two flavors of the question. The main question asked is "Which one should I use?" and the answer is "use the one that matches your intent". The *implied* question is "which one would be faster to use?", and the answer there is `UNION ALL`
Daniel DiPaolo
@Billy ONeal okay that's one case where you can be extremely confident that data that produces no duplicates now will always produce no duplicates, at least on whatever architecture you're using right now since I doubt you'll exhaust the GUID-space anytime soon :)
Daniel DiPaolo
A: 

I would use UNION ALL anyway. Even though you know that there are not going to be duplicates, depending on your database server engine, it might not know that.

So, just to provide extra information to DB server, in order for its query planner a better choice (probably), use UNION ALL.

Having said that, if your DB server's query planner is smart enough to infer that information from the UNION clause and table indexes, then results (performance and semantic wise) should be the same.

Either case, it strongly depends on the DB server you are using.

Pablo Santa Cruz
+1  A: 

According to http://blog.sqlauthority.com/2007/03/10/sql-server-union-vs-union-all-which-is-better-for-performance/ at least for performance it is better to use UNION ALL, since it does not actively distinct duplicates and as such is faster

Semyazas
+3  A: 

I see that you've tagged this question PERFORMANCE, so I assume that's your primary consideration.

UNION ALL will absolutely outperform UNION since SQL doesn't have to check the two sets for dups.

Unless you need SQL to perform the duplicate checking for you, always use UNION ALL.

BradC
I think he already knows that. He's asking specifically in the scenario that he knows (and probably DB also knows) there are not going to be dups.
Pablo Santa Cruz
I don't get your point. Are you asking, "if SQL can determine that there will never be overlaps in the two sets, will it optimize the `UNION` so it doesn't have to compare, making it perform identically to `UNION ALL`? Answer: NO. use `UNION ALL`
BradC
A: 

Since there will be no duplicates from the two use UNION ALL. You don't need to check for duplicates and UNION ALL will preform the task more efficiently.

Mark Mayfield