views:

62

answers:

4

I have to write a query that performs a union between two tables with similar data. The results need to be distinct. The problem I have is that some fields that should be the same are not when it comes to empty values. Some are indicated as null, and some have empty string values. My question is, is there a better way to perform the following query? (without fixing the actual data to ensure proper defaults are set, etc) Will using the Case When be a big performance hit?

Select  
    When Column1 = '' Then NULL Else Column1 as [Column1],
    When Column2 = '' Then NULL Else Column2 as [Column2]
From TableA

UNION ALL

Select 
    When Column1 = '' Then NULL Else Column1 as [Column1],
    When Column2 = '' Then NULL Else Column2 as [Column2]
From TableB
+1  A: 

A Case should perform fine, but IsNull is more natural in this situation. And if you're searching for distinct rows, doing a union instead of a union all will accomplish that (thanks to Jeffrey L Whitledge for pointing this out):

select  IsNull(col1, '')
,       IsNull(col2, '')
from    TableA
union
select  IsNull(col1, '')
,       IsNull(col2, '')
from    TableB
Andomar
I think you've got it backwards. The OP wants NULL when the column equals '', not the other way around.
Joe Stefanelli
Your characterization of `union` is incorrect for SQL Server. `union` will remove all duplicate rows from the results regardless of their source.
Jeffrey L Whitledge
@Joe Stefenelli: The way I read it, the OP just says he wants the same result @Jeffrey L Whitledge: Thanks, didn't know that! Answer edited
Andomar
@Andomar: I now agree based on the comments the OP left on [my answer](http://stackoverflow.com/questions/3525396/best-way-to-write-union-query-when-dealing-with-null-and-empty-string-values/3525454#3525454).
Joe Stefanelli
+2  A: 

I don't think it would make any difference in performance, but NULLIF is another way to write this and, IMHO, looks a little cleaner.

Select  
    NULLIF(Column1, '') as [Column1],
    NULLIF(Column2, '') as [Column2]
From TableA

UNION

Select 
    NULLIF(Column1, '') as [Column1],
    NULLIF(Column2, '') as [Column2]
From TableB
Joe Stefanelli
Thanks for the note about the NULLIF. What is better, to return a '' or NULL.
dretzlaff17
It would really depend on what you're doing with the data after you've selected it. Again, no difference in performance to return one vs. the other.
Joe Stefanelli
Since performance is not an issue, I need to return NULL when '' is found for a record. Since I don't know of any reverse logic for the IFNULL or NULLIF I think I am going to keep as is. Thanks to everyone for the input on this.
dretzlaff17
A: 

You can keep your manipulation operations separate from the union if you do whatever manipulation you want (substitute NULL for the empty string) in a separate view, then union the views.

You shouldn't have to apply the same manipulation on both sets, though.

If that's the case, union them first, then apply the manipulation to the resulting, unioned set once.

Half as much manipulation code to support that way.

Beth
+1  A: 

Use UNION to remove duplicates - it's slower than UNION ALL for this functionality:

SELECT CASE 
         WHEN LEN(LTRIM(RTRIM(column1))) = 0 THEN NULL
         ELSE column1
       END AS column1,
       CASE 
         WHEN LEN(LTRIM(RTRIM(column2))) = 0 THEN NULL
         ELSE column2
       END AS column2
  FROM TableA
UNION 
SELECT CASE 
         WHEN LEN(LTRIM(RTRIM(column1))) = 0 THEN NULL
         ELSE column1
       END,
       CASE 
         WHEN LEN(LTRIM(RTRIM(column2))) = 0 THEN NULL
         ELSE column2
       END 
  FROM TableB

I changed the logic to return NULL if the column value contains any number of spaces and no actual content.

CASE expressions are ANSI, and more customizable than NULLIF/etc syntax.

OMG Ponies