views:

52

answers:

4

I'm stuck on an aggregation problem that I can't get to the bottom of.

I have some data which is best summarized as follows

id |phraseId|seqNum|word
=========================
1  |1       |1     |hello
2  |1       |2     |world
3  |2       |1     |black
4  |2       |2     |and
5  |2       |3     |white

I'd like a query that gives back the following data:

phraseId|completePhrase
========================
1       |hello world
2       |black and white

Anyone?

EDIT

I notice all the provided solutions use FOR XML PATH. What is this magic?

+1  A: 

One solution is to create an UDF using an FOR XML PATH expression.

  • the UDF handles the concatenation of one PhraseID
  • it can be used in a normal select

SQL Statement

SELECT  PhraseID, dbo.UDF_ConcatWord(PhraseID)
FROM    Phrases
GROUP BY PhraseID

Creating the UDF

CREATE FUNCTION dbo.UDF_ConcatWord(@phraseID INT) RETURNS VARCHAR(8000) AS
BEGIN  
  DECLARE @r VARCHAR(8000)
  SELECT @r = (
    SELECT  word + ', '
    FROM    Phrases
    WHERE   phraseID = @phraseID
    FOR XML PATH('')
  )
  IF LEN(@r) > 0 SET @r = SUBSTRING(@r, 1, LEN(@r)-1)
  RETURN @r
END
GO

edit

After revising some of the links myself, an even shorter solution is

SQL Statement

SELECT  DISTINCT p1.PhraseID
        , STUFF(( SELECT  ' ' + p2.word 
                  FROM    Phrases AS p2 
                  WHERE   p2.PhraseID = p1.PhraseID 
                  FOR XML PATH('')), 1, 1, '') AS completePhrase
FROM      Phrases AS p1
ORDER BY  p1.PhraseID
Lieven
KM
OK. Thanks for all the answers. I've gone with this form, but they were all excellent. Thanks
spender
+1  A: 

I have cheated a bit by assuming that you have a table which holds the header record for each phrase. If this is missing, you could construct it by selecting a distinct list of phraseIDs from the table containing the words:

declare @words table
(id int
,phraseId int
,seqNum int
,word varchar(10)
)

insert @words
select 1,1,1,'hello'
union select 2,1,2,'world'
union select 3,2,1,'black'
union select 4,2,2,'and'
union select 5,2,4,'white'

declare @phrase table
(phraseId int)

insert @phrase
select 1
union select 2

select phraseID
       ,phraseText AS completePhrase
FROM @phrase AS p
CROSS APPLY (select word + ' ' as [text()]
             from @words AS w
             where w.phraseID = p.phraseID
             for xml path('')
            ) as phrases (phraseText)
Ed Harper
Good, but your assumption of a header record is unnecessary. For instance:select phraseID,phraseText AS completePhraseFROM (select distinct phraseId from @words) AS pCROSS APPLY (select word + ' ' as [text()] from @words AS w where w.phraseID = p.phraseID for xml path('') ) as phrases (phraseText)
spender
@spender - Isn't that exactly what I said at the start of my answer?
Ed Harper
Sorry... missed that. Scratch my last comment!
spender
+2  A: 

try this:

DECLARE @TableA  table (RowID int, phraseId varchar(5),seqNum int, word varchar(5))

INSERT INTO @TableA VALUES (1,1,1,'hello')
INSERT INTO @TableA VALUES (2,1,2,'world')
INSERT INTO @TableA VALUES (3,2,1,'black')
INSERT INTO @TableA VALUES (4,2,2,'and')
INSERT INTO @TableA VALUES (5,2,3,'white')

SELECT
    c1.phraseId
        ,STUFF(
                 (SELECT
                      ' ' + word
                      FROM @TableA  c2
                      WHERE c2.phraseId=c1.phraseId
                      ORDER BY c1.phraseId, seqNum
                      FOR XML PATH('') 
                 )
                 ,1,1, ''
              ) AS CombinedValue
    FROM @TableA c1
    GROUP BY c1.phraseId
    ORDER BY c1.phraseId

OUTPUT:

phraseId CombinedValue
-------- --------------------------
1        hello world
2        black and white

(2 row(s) affected)
KM
A: 

In the end I used Lieven's second answer, but found that for certain string combinations, the FOR XML PATH('') trick causes problems to occur:

declare @phrases table
(
    id int
    ,phraseId int
    ,seqNum int
    ,word varchar(10)
)

insert 
    @phrases 
values
    (1,1,1,'hello'),
    (2,1,2,'world'),
    (3,2,1,'black'),
    (4,2,2,'and'),
    (5,2,3,'white')

SELECT  
    DISTINCT p1.PhraseID, 
    STUFF(
        ( 
            SELECT  
                ' ' + p2.word 
            FROM    
                @phrases AS p2 
            WHERE   
                p2.PhraseID = p1.PhraseID 
            FOR XML PATH('')
        ), 1, 1, '') AS completePhrase
FROM      
    @phrases AS p1
ORDER BY  
    p1.PhraseID

works fine, but if the example uses characters that would need escaping were they used in an XML, problems occur. For instance, running the following data through it:

insert 
    @words 
values
    (1,1,1,'hello>'), --notice the less than symbol
    (2,1,2,'world'),
    (3,2,1,'black')

Gives

hello> world

and also if the source table is declared out of order, an order by is required

A small mod to the original query fixes all:

SELECT  
    DISTINCT p1.PhraseID, 
    STUFF(
        ( 
            SELECT  
                ' ' + p2.word 
            FROM    
                @words AS p2 
            WHERE   
                p2.PhraseID = p1.PhraseID 
            ORDER BY
                p2.seqNum  --required
            FOR XML PATH(''),TYPE
        ).value('.','nvarchar(4000)'), 
        1, 
        1, 
        ''
    ) AS completePhrase
FROM      
    @words AS p1
ORDER BY  
    p1.PhraseID

(see http://stackoverflow.com/questions/1051362)

spender