ansaurus

Question

Answer 1

+2 A:

You can use the count function! This will require you to use the group by clause, where you tell count how to break up, or group, itself. Gropu by is used for any aggregate function in SQL.

select
    prop1,
    prop2,
    prop3,
    prop4,
    count(*) as count
from
    tbl
group by
    prop1,
    prop2,
    prop3,
    prop4,
    y,
    x
order by y, x

Update: The OP mentioned these are ordered by y and x, not part of the result set. In this case, you can still use y and x as part of the group by.

Keep in mind that order means nothing if it doesn't have ordering columns, so in this case, we have to respect that with y and x in the group by.

Eric 2009-09-07 17:12:57

This may work, but a key element of his question involves "repetitive" rows of data, and it's not clear what the pattern of inserts into this table is.

David Andres 2009-09-07 17:22:21

This doesn't do as required, it will group rows that are not ajacent to each other, and as I indicated as part of the question - this is highly important. Is there some way in which a windowing function could be used in conjunction with this? (I know very little about them). Thanks.

Kieran Benton 2009-09-07 17:34:13

@Kieran: Ordering doesn't matter in SQL Server, unless you have columns you're ordering by. You didn't indicate that's what you were doing in the question. However, you did in the comments, and so I changed it into something that works the way you want it to.

Eric 2009-09-07 18:08:00

This will work, if the y, x ordering remains consistent among the repetitious rows.

David Andres 2009-09-07 18:21:08

@dandres: What? Without a column basis for ordering, SQL Server does not return rows in a consistent order. Without a consistent ordering set, you can't expect to have anything with regards to "adjacent" be meaningful.

Eric 2009-09-07 18:46:29

All I mean to say is that y, x could be anything. They could be an incremental values assigned by order of transaction. For example, the first four rows of his data could have y, x = [{0, 0}, {1, 1}, {2, 2}, {3, 3}]. This will ORDER BY correctly, but won't GROUP BY all that because the last two pairs are part of the same repetition. The OP hasn't been specific about the traits of the y, x ordering.

David Andres 2009-09-07 18:53:38

Eric, that can happen is the prop1...prop4 can be the same across several x,y values and you want those to be counted together. At extreme it can be that entire 70k rows have the same prop1..prop4 but different x,y, u return 70k count()=1 as opposed to a single count()=70k.

Remus Rusanu 2009-09-07 19:03:45

Answer 2

+1 A:

This will work, though it is painful to look at:

;WITH Ordering
AS
(
  SELECT Prop1,        
  Prop2,        
  Prop3,        
  Prop4,
  ROW_NUMBER() OVER (ORDER BY Y, X) RN
  FROM Props
)
SELECT 
  CurrentRow.Prop1, 
  CurrentRow.Prop2, 
  CurrentRow.Prop3, 
  CurrentRow.Prop4, 
  CurrentRow.RN - 
    ISNULL((SELECT TOP 1 RN FROM Ordering O3 WHERE RN < CurrentRow.RN AND (CurrentRow.Prop1 <> O3.Prop1 OR CurrentRow.Prop2 <> O3.Prop2 OR CurrentRow.Prop3 <> O3.Prop3 OR CurrentRow.Prop4 <> O3.Prop4) ORDER BY RN DESC), 0) Repetitions
FROM Ordering CurrentRow
LEFT JOIN Ordering O2 ON CurrentRow.RN + 1 = O2.RN
WHERE O2.RN IS NULL OR (CurrentRow.Prop1 <> O2.Prop1 OR CurrentRow.Prop2 <> O2.Prop2 OR CurrentRow.Prop3 <> O2.Prop3 OR CurrentRow.Prop4 <> O2.Prop4) 
ORDER BY CurrentRow.RN

The gist is the following:

Enumerate each row using ROW_NUMBER OVER to get the correct order.
Find the maximums per cycle by joining only when the next row has different fields or when the next row does not exist.
Figure out the count of repetitions is by taking the current row number (presumed to be the max for this cycle) and subtracting from it the maximum row number of the previous cycle, if it exists.

David Andres 2009-09-07 17:33:29

This returns multiple rows. You'd need a very expensive `distinct` to filter them out.

Eric 2009-09-07 17:40:31

This looks to be towards what I was thinking (afraid my knowledge of windowing functions is a bit fuzzy) but I cant get this to execute, missing * in the COUNT() for a start but SQL 2005 at least doesnt seem to like both a PARTITION BY and ORDER BY clause in the OVER. Am I missing something? Cheers.

Kieran Benton 2009-09-07 17:46:06

I am working on refining this query, because it does return duplicates and unfortunately the COUNT OVER statement doesn't support ordering. I'll write back with a better solution.

David Andres 2009-09-07 17:49:12

the above works with limited testing

David Andres 2009-09-08 01:54:30

ansaurus

tags:

views:

answers:

SQL Server cross-row compression

related questions