views:

105

answers:

2

I have a query that's selecting a bunch of fields related to names and addresses of customers but it boils down to:

SELECT DISTINCT a, b, c, ... FROM big_dumb_flat_table

it returns a bunch of records (10986590). When I replace the commas in the select-list to format it as a pipe-separated concatenated string:

SELECT DISTINCT a + '|' + b + '|' + c + '|' + ... FROM big_dumb_flat_table

it's returning 248 more records. I've reassured myself that there are no pipes in any of the fields that could be screwing the fidelity of the returned set. What's going on here?

+10  A: 

Trailing spaces could cause this. For string comparisons these are ignored.

CREATE TABLE #T
(
a varchar(10),
b varchar(10),
c varchar(10)
)

INSERT INTO #T
SELECT 'a ' as a, 'b' as b, 'c ' as c union all
SELECT 'a' as a, 'b' as b, 'c ' as c

SELECT DISTINCT a, b, c  
FROM #T /*1 result*/

SELECT DISTINCT a + '|' + b + '|' + c + '|'   
FROM #T /*2 results*/


SELECT DISTINCT LTRIM(RTRIM(a)) + '|' + LTRIM(RTRIM(b)) + '|' +
                LTRIM(RTRIM(c)) + '|'   
FROM #T /*1 result*/
Martin Smith
Neat. I ran the abbreviated code you first posted to see for myself that you were right and now I'm testing the two queries with all fields wrapped in ltrim(rtrim()). That should eliminate this problem, right?
clweeks
And he should be handling nulls as well.
HLGEM
@clweeks - Are these columns nullable? If so you'll need to remember to handle Nulls with COALESCE or something. (NB: If this is just a one off task you could do `SET CONCAT_NULL_YIELDS_NULL OFF` but not something for production code!)
Martin Smith
LTRIM(RTRIM()) on all the fields caused the counts to match. Some of the columns are nullable but there aren't any null values. Thanks a ton!
clweeks
+2  A: 

The really aren't any scenarios that I can think of that would get you MORE records, only fewer. I would simplify the query by only selecting a + '|', then add more columns as you go.

bzarah