ansaurus

Question

Counting consecutive duplicate records with SQL

Answer 1

A:

You can do that pretty easy with Common Table Expressions in SQL 2005.

ongle 2009-08-10 12:06:42

Answer 2

+1 A:

(Edited after comment)

You can do that by assigning a "head" number to each group of consecutive values. After that you select the head number for each row, and do an aggregate per head.

Here's an example, with CTE's for readability:

WITH
OrderedTable as (
 select value, rownr = row_number() over (order by userid, id)
 from YourTable
 where userid = 2287
),
Heads as (
 select cur.rownr, CurValue = cur.value
 , headnr = row_number() over (order by cur.rownr)
 from OrderedTable cur
 left join OrderedTable prev on cur.rownr = prev.rownr+1 
 where IsNull(prev.value,-1) != cur.value
),
ValuesWithHead as (
 select value
 , HeadNr = (select max(headnr) 
             from Heads 
             where Heads.rownr <= data.rownr)
 from OrderedTable data
)
select Value, [Count] = count(*)
from ValuesWithHead
group by HeadNr, value
order by count(*) desc

This will output:

Value   Count
2       4
3       3
1       2
2       1
2       1
7       1

Use "top 1" to select the first row only.

Here's my query to create the test data:

create table YourTable (
    id int primary key,
    userid int,
    variable varchar(25),
    value int
)
insert into YourTable (id, userid, variable, value) values (3115, 2287, 'votech05', 2)
insert into YourTable (id, userid, variable, value) values (3116, 2287, 'comcol05', 1)
insert into YourTable (id, userid, variable, value) values (3117, 2287, 'fouryr05', 1)
insert into YourTable (id, userid, variable, value) values (3118, 2287, 'none05', 2)
insert into YourTable (id, userid, variable, value) values (3119, 2287, 'ocol1_05', 2)
insert into YourTable (id, userid, variable, value) values (3120, 2287, 'disnone', 2)
insert into YourTable (id, userid, variable, value) values (3121, 2287, 'dissense', 2)
insert into YourTable (id, userid, variable, value) values (3122, 2287, 'dismobil', 3)
insert into YourTable (id, userid, variable, value) values (3123, 2287, 'dislearn', 3)
insert into YourTable (id, userid, variable, value) values (3124, 2287, 'disment', 3)
insert into YourTable (id, userid, variable, value) values (3125, 2287, 'disother', 2)
insert into YourTable (id, userid, variable, value) values (3126, 2287, 'disrefus', 7)

Andomar 2009-08-10 12:34:48

Not exactly, since I don't want the total count of each value, just how they are clustered, i.e., 2,1,2,2,1,1,2,2,2,2,1,1 would return value=2, count=4, not 7.

Jason Francis 2009-08-10 12:44:39

+1, works after edit, and way better than using a cursor!

KM 2009-08-10 15:27:02

This looks promising. Let me take a look at it. I'd rather not use cursors if I can help it (although with these types of inter-dependent problems, the performance might work out the same using a CTE). Thanks.

Jason Francis 2009-08-10 16:49:50

Answer 3

+1 A:

This may be one of those problems best solved with cursors. Give this a try. It should be close, but it's not tested, since you didn't provide CREATE TABLE and INSERT statements with sample data to make that easy.

declare @userid int
set @userid = 2287;
declare C cursor fast_forward for
select VALUE from T
where USERID = @userid
order by ID;

declare @value int, @prevvalue int;
declare @runcount int, @runlongest int;
set @runlongest = 0;
declare @valuelongest int;
open C;
fetch next from C into @value;
while @@fetch_status = 0 begin
  if @value = @prevvalue set @runcount = @runcount + 1 else set @runcount = 1;
  if @runcount > @runlongest begin
    set @runlongest = @runcount;
    set @valuelongest = @value;
  end;
  set @prevvalue = @value;
  fetch next from C into @value;
end;
select @userid as USERID, @valuelongest as VALUE, @runlongest as [COUNT];

close C;
deallocate C;

It won't be fast with 75M rows, but it probably won't be too slow, either. If your runs are very long, and you have the right indexes, you can do better by numbering the rows with row_number in a temp table, then using a WHILE loop that jumps through a run at a time. Let me know if you think that's worth looking at (and if you can, post CREATE TABLE and INSERT statements with sample data).

Steve Kass 2009-08-10 13:35:05

Answer 4

A:

Hi Brian,

without testing it I think that the following should work:

Row_number() over (partition by userid, value order by id)

once this is done just select the one with the highest row_nunber

Please let me know if this worked!!

Thanks, Edi

eschlech 2009-08-10 21:27:31

Edi,row_number() won't work, because it will treat consecutive values the same way as nonconsecutive ones. The sequence of values is the issue here, not just how many there are.

Steve Kass 2009-08-10 22:05:49

so soes this mean no order can be defined? Sorry I do not get this.

eschlech 2009-08-10 22:41:35

ansaurus

tags:

views:

answers:

Counting consecutive duplicate records with SQL

related questions