views:

344

answers:

3

i have a query that retruns rows that i want, e.g.

QuestionID  QuestionTitle  UpVotes  DownVotes  
==========  =============  =======  =========  
2142075     Win32: Cre...        0          0  
2232727     Win32: How...        2          0  
1870139     Wondows Ae...       12          0  

Now i want to have a column returned that contains a comma separated list of "Authors" (e.g. original poster and editors). e.g.:

QuestionID  QuestionTitle  UpVotes  DownVotes  Authors
==========  =============  =======  =========  ==========
2142075     Win32: Cre...        0          0  Ian Boyd  
2232727     Win32: How...        2          0  Ian Boyd, roygbiv
1870139     Wondows Ae...       12          0  Ian Boyd, Aaron Klotz, Jason Diller, danbystrom


Faking It

SQL Server 2000 does not have a CONCAT(AuthorName, ', ') aggregation operation, i've been faking it - performing simple sub-selects for the TOP 1 author, and the author count.

QuestionID  QuestionTitle  UpVotes  DownVotes  AuthorCount  FirstAuthor
==========  =============  =======  =========  ===========  =========== 
2142075     Win32: Cre...        0          0            1  Ian Boyd
2232727     Win32: How...        2          0            2  Ian Boyd
1870139     Wondows Ae...       12          0            3  Ian Boyd

If there is more than one author, then i show the user an ellipses ("…"), to indicate there is more than one. e.g. the user would see:

QuestionID  QuestionTitle  UpVotes  DownVotes  Authors
==========  =============  =======  =========  ==========
2142075     Win32: Cre...        0          0  Ian Boyd
2232727     Win32: How...        2          0  Ian Boyd, …
1870139     Wondows Ae...       12          0  Ian Boyd, …

And that works well enough, since normally a question isn't edited - which means i'm supporting the 99% case perfectly, and the 1% case only half-assed as well.


Threaded Re-query

As a more complicated, and bug-prone solution, i was thinking of iterating the displayed list, and spinning up a thread-pool worker thread for each "question" in the list, perform a query against the database to get the list of authors, then aggregating the list in memory. This would mean that the list fills first in the (native) application. Then i issue a few thousand individual queries afterwards.

But that would be horribly, horrendously, terribly, slow. Not to mention bug-riddled, since it will be thread work.


Yeah yeah yeah

Adam Mechanic says quite plainly:

Don't concatenate rows into delimited strings in SQL Server. Do it client side.

Tell me how, and i'll do it.


/cry

Can anyone think of a better solution, that is as fast (say...within an order of magnitude) than my original "TOP 1 plus ellipses" solution?

For example, is there a way to return a results set, where reach row has an associated results set? So for each "master" row, i could get at a "detail" results set that contains the list.


Code for best answer

Cade's link to Adam Machanic's solution i like the best. A user-defined function, that seems to operate via magic:

CREATE FUNCTION dbo.ConcatAuthors(@QuestionID int)
RETURNS VARCHAR(8000)
AS
BEGIN
    DECLARE @Output VARCHAR(8000)
    SET @Output = ''

    SELECT @Output =    CASE @Output 
                WHEN '' THEN AuthorName 
                ELSE @Output + ', ' + AuthorName 
                END
    FROM  (
        SELECT QuestionID, AuthorName, QuestionDate AS AuthorDate FROM Questions
        UNION
        SELECT QuestionID, EditorName, EditDate FROM QuestionEdits
    ) dt
    WHERE dt.QuestionID = @QuestionID
    ORDER BY AuthorDate

    RETURN @Output
END

With a T-SQL usage of:

SELECT QuestionID, QuestionTitle, UpVotes, DownVotes, dbo.ConcatAuthors(AuthorID)
FROM Questions
+2  A: 

Have a look at these articles:

http://sqlblog.com/blogs/adam_machanic/archive/2006/07/12/rowset-string-concatenation-which-method-is-best.aspx

http://www.simple-talk.com/sql/t-sql-programming/concatenating-row-values-in-transact-sql/ (See Phil Factor's cross join solution in the responses - which will work in SQL Server 2000)

Obviously in SQL Server 2005, the FOR XML trick is easiest, most flexible and generally most performant.

As far as returning a rowset for each row, if you still want to do that for some reason, you can do that in a stored procedure, but the client will need to consume all the rows in the first rowset and then go to the next rowset and associate it with the first row in the first rowset, etc. Your SP would need to open a cursor on the same set it returned as the first rowset and run multiple selects in sequence to generate all the child rowsets. It's a technique I've done, but only where ALL the data actually was needed (for instance, in a fully-populated tree view).

And regardless of what people say, doing it client-side is often a very big waste of bandwidth, because returning all the rows and doing the looping and breaking in the client side means that huge number of identical columns are being transferred at the start of each row just to get the changing column at the end of the row.

Wherever you do it, it should be an informed decision based on your use case.

Cade Roux
+1 for Adam Machanic's solution. And another +1 for you, sir, for mentioning the bandwidth cost in doing it "in client", giving many duplicated rows.
Ian Boyd
@Ian Boyd I did not realize that a FOR XML RAW solution will work in SQL Server 2000 - I would totally use that option. I use FOR XML PATH all the time and it works great.
Cade Roux
A: 

I'm not sure if this works in SQL Server 2000, but you can try it:

--combine parent and child, children are CSV onto parent row
CREATE TABLE #TableA (RowID int, Value1 varchar(5), Value2 varchar(5))
INSERT INTO #TableA VALUES (1,'aaaaa','A')
INSERT INTO #TableA VALUES (2,'bbbbb','B')
INSERT INTO #TableA VALUES (3,'ccccc','C')

CREATE TABLE #TableB (RowID int, TypeOf varchar(10))
INSERT INTO #TableB VALUES (1,'wood')
INSERT INTO #TableB VALUES (2,'wood')
INSERT INTO #TableB VALUES (2,'steel')
INSERT INTO #TableB VALUES (2,'rock')
INSERT INTO #TableB VALUES (3,'plastic')
INSERT INTO #TableB VALUES (3,'paper')


SELECT
    a.*,dt.CombinedValue
    FROM #TableA        a
        LEFT OUTER JOIN (SELECT
                             c1.RowID
                                 ,STUFF(
                                          (SELECT
                                               ', ' + TypeOf
                                               FROM (SELECT
                                                         a.RowID,a.Value1,a.Value2,b.TypeOf
                                                         FROM #TableA                 a
                                                             LEFT OUTER JOIN #TableB  b ON a.RowID=b.RowID
                                                    ) c2
                                               WHERE c2.rowid=c1.rowid
                                               ORDER BY c1.RowID, TypeOf
                                               FOR XML PATH('') 
                                          )
                                          ,1,2, ''
                                       ) AS CombinedValue
                             FROM (SELECT
                                       a.RowID,a.Value1,a.Value2,b.TypeOf
                                       FROM #TableA                 a
                                           LEFT OUTER JOIN #TableB  b ON a.RowID=b.RowID
                                  ) c1
                             GROUP BY RowID
                        ) dt ON a.RowID=dt.RowID

OUTPUT from SQL Server 2005:

RowID       Value1 Value2 CombinedValue
----------- ------ ------ ------------------
1           aaaaa  A      wood
2           bbbbb  B      rock, steel, wood
3           ccccc  C      paper, plastic

(3 row(s) affected)

EDIT query that replaces FOR XML PATH with FOR XML RAW, so this should work on SQL Server 2000

SELECT
    a.*,dt.CombinedValue
    FROM #TableA        a
        LEFT OUTER JOIN (SELECT
                             c1.RowID
                                 ,STUFF(REPLACE(REPLACE(
                                          (SELECT 
                                               ', ' + TypeOf as value
                                               FROM (SELECT
                                                         a.RowID,a.Value1,a.Value2,b.TypeOf
                                                         FROM #TableA                 a
                                                             LEFT OUTER JOIN #TableB  b ON a.RowID=b.RowID
                                                    ) c2
                                               WHERE c2.rowid=c1.rowid
                                               ORDER BY c1.RowID, TypeOf
                                               FOR XML RAW
                                          )
                                         ,'<row value="',''),'"/>','')
                                   , 1, 2, '') AS CombinedValue
                             FROM (SELECT
                                       a.RowID,a.Value1,a.Value2,b.TypeOf
                                       FROM #TableA                 a
                                           LEFT OUTER JOIN #TableB  b ON a.RowID=b.RowID
                                  ) c1
                             GROUP BY RowID
                        ) dt ON a.RowID=dt.RowID

OUTPUT, same as original query

KM
FOR XML PATH('') does not work with SQL 2000.
G Mastros
@G Mastros, thanks, I've updated it with FOR XML RAW, which is supported by SQL Server 2000: http://www.mssqltips.com/tip.asp?tip=1077
KM
A: 

You can also take a look to this script. It's basically the cross join approach that Cade Roux also mentioned in his post.

The above approach looks very clean: you have to do a view first and secondly create a statement based on the values in the view. The second sql statement you can build dynamically in your code, so it should be straight forward to use.

MicSim