tags:

views:

124

answers:

8

SQL Experts,

Is there an efficient way to group runs of data together using SQL? Or is it going to be more efficient to process the data in code. For example if I have the following data:

ID|Name
01|Harry Johns
02|Adam Taylor
03|John Smith
04|John Smith
05|Bill Manning
06|John Smith

I need to display this:

Harry Johns
Adam Taylor
John Smith (2)
Bill Manning
John Smith

@Matt: Sorry I had trouble formatting the data using an embedded html table it worked in the preview but not in the final display.

A: 

For this particular case, all you need to do is group by the name and ask for the count, like this:

select Name, count(*)
from MyTable
group by Name

That'll get you the count for each name as a second column.

You can get it all as one column by concatenating like this:

select Name + ' (' + cast(count(*) as varchar) + ')'
from MyTable
group by Name
Matt Hamilton
+2  A: 

@Matt: that groups all the rows with a particular Name. The OP wanted to group consequtive rows, I'm not aware of a way to do it in SQL.

Nickolay
A: 

@Nickolay Ah you're right - now that he has reformatted his question I can see the data more clearly.

Yes, I'm not sure of a pure SQL way to do this either. You'd have to do it with a cursor, I think.

Matt Hamilton
+1  A: 

Well, this:

select Name, count(Id)
from MyTable
group by Name

will give you this:

Harry Johns, 1
Adam Taylor, 1
John Smith, 2
Bill Manning, 1

and this (MS SQL syntax):

select Name +
    case when ( count(Id) > 1 ) 
         then ' ('+cast(count(Id) as varchar)+')' 
         else ''
    end
from MyTable
group by Name

will give you this:

Harry Johns
Adam Taylor
John Smith (2)
Bill Manning

Did you actually want that other John Smith on the end of your results?

EDIT: Oh I see, you want consecutive runs grouped. In that case, I'd say you need a cursor or to do it in your program code.

Blorgbeard
+2  A: 

Try this:

select n.name, 
    (select count(*) 
     from myTable n1
     where n1.name = n.name and n1.id >= n.id and (n1.id <=
        (
        select isnull(min(nn.id), (select max(id) + 1 from myTable))
        from myTable nn
        where nn.id > n.id and nn.name <> n.name
        )
     ))
from myTable n
where not exists (
   select 1
   from myTable n3
   where n3.name = n.name and n3.id < n.id and n3.id > (
            select isnull(max(n4.id), (select min(id) - 1 from myTable))
            from myTable n4
            where n4.id < n.id and n4.name <> n.name
            )
)

I think that'll do what you want. Bit of a kludge though.

Phew! After a few edits I think I have all the edge cases sorted out.

Matt Hamilton
+2  A: 

I hate cursors with a passion... but here's a dodgy cursor version...

Declare @NewName Varchar(50)
Declare @OldName Varchar(50)
Declare @CountNum int
Set @CountNum = 0

DECLARE nameCursor CURSOR FOR 
SELECT Name
FROM NameTest
OPEN nameCursor

FETCH NEXT FROM nameCursor INTO @NewName

  WHILE @@FETCH_STATUS = 0 

    BEGIN

      if @OldName <> @NewName
      BEGIN
         Print @OldName + ' (' + Cast(@CountNum  as Varchar(50)) + ')'
         Set @CountNum = 0
      END
      SELECT @OldName = @NewName
      FETCH NEXT FROM nameCursor INTO @NewName
      Set @CountNum = @CountNum + 1

    END
Print @OldName + ' (' + Cast(@CountNum  as Varchar(50)) + ')'

CLOSE nameCursor
DEALLOCATE nameCursor
Leon Bambrick
+1  A: 

How about this:

declare @tmp table (Id int, Nm varchar(50));

insert @tmp select 1, 'Harry Johns';
insert @tmp select 2, 'Adam Taylor';
insert @tmp select 3, 'John Smith';
insert @tmp select 4, 'John Smith';
insert @tmp select 5, 'Bill Manning';
insert @tmp select 6, 'John Smith';

select * from @tmp order by Id;

select Nm, count(1) from 
(
select Id, Nm, 
    case when exists (
     select 1 from @tmp t2 
     where t2.Nm=t1.Nm 
     and (t2.Id = t1.Id + 1 or t2.Id = t1.Id - 1)) 
     then 1 else 0 end as Run
from @tmp t1
) truns group by Nm, Run

[Edit] That can be shortened a bit

select Nm, count(1) from (select Id, Nm, case when exists (
     select 1 from @tmp t2 where t2.Nm=t1.Nm 
     and abs(t2.Id-t1.Id)=1) then 1 else 0 end as Run
from @tmp t1) t group by Nm, Run
Eric Z Beard
+2  A: 

My solution just for kicks (this was a fun exercise), no cursors, no iterations, but i do have a helper field

-- Setup test table
DECLARE @names TABLE    (
                        id      INT                 IDENTITY(1,1),
                        name    NVARCHAR(25)        NOT NULL,
                        grp     UNIQUEIDENTIFIER    NULL
                        )

INSERT @names (name)
SELECT 'Harry Johns'    UNION ALL 
SELECT 'Adam Taylor'    UNION ALL
SELECT 'John Smith'     UNION ALL
SELECT 'John Smith'     UNION ALL
SELECT 'Bill Manning'   UNION ALL
SELECT 'Bill Manning'   UNION ALL
SELECT 'Bill Manning'   UNION ALL
SELECT 'John Smith'     UNION ALL
SELECT 'Bill Manning'   

-- Set the first id's group to a newid()
UPDATE      n
SET         grp = newid()
FROM        @names n
WHERE       n.id = (SELECT MIN(id) FROM @names)

-- Set the group to a newid() if the name does not equal the previous
UPDATE      n
SET         grp = newid()
FROM        @names n
INNER JOIN  @names b
        ON  (n.ID - 1) = b.ID
        AND ISNULL(b.Name, '') <> n.Name

-- Set groups that are null to the previous group
-- Keep on doing this until all groups have been set
WHILE (EXISTS(SELECT 1 FROM @names WHERE grp IS NULL))
BEGIN
    UPDATE      n
    SET         grp = b.grp
    FROM        @names n
    INNER JOIN  @names b
            ON  (n.ID - 1) = b.ID
            AND n.grp IS NULL
END

-- Final output
SELECT      MIN(id)     AS id_start,
            MAX(id)     AS id_end,
            name,
            count(1)    AS consecutive
FROM        @names
GROUP BY    grp, 
            name
ORDER BY    id_start

/*
Results:

id_start    id_end  name            consecutive
1           1     Harry Johns     1
2           2     Adam Taylor     1
3           4     John Smith     2
5           7     Bill Manning    3
8           8     John Smith     1
9           9     Bill Manning    1
*/
Jon Erickson