views:

114

answers:

4

I am using SQL Server 2005.

SEE END OF THIS POST REGARDING BAD TABLE DESIGN.

I have two columns. I would like to group by the first column with respect to the order of the second column. Microsofts documentation states that GROUP BY clause does not care about order, how can I enforce this??

Here is my pseudo query:

SELECT col_1,
       MIN(col_2),
       MAX(col_2)
FROM someTable
GROUP BY col_1 (*** WITH RESPECT TO ORDER OF col_2***)

If I ran the query on the following table:

Col_1    Col_2
A       1
A       2
A       3
B       4
C       5
C       6
B       7
A       9

I should get the following results:

Col_1  Min   Max
A      1     3
B      4     4
C      5     6
B      7     7
A      9     9

The key part is that I CAN NOT have all 4 records of A lumped together in the result set. When the table/subquery is queried against, it is sorted by col_2, each new instance of col_1 should result in a new grouping. Thanks, I could not find anything on this.

I can do NOTHING with the table design. This was a table that was created by an outside vendor that is used with their proprietary software. I repeat I can do nothing about the table design!!!!

+2  A: 

A question like this almost always means that your database design is incorrect to return the results you need.

If items are to be grouped in a series, you need a field to specify that these records are part of the same group. Then you can group by both col_1 and the seried_id.

How you would determine at insert time which series the row belongs to is your problem and would depend on your business rules. You can never rely on data being stored sequentially, so you need to fix this process before you go any further or your data will NEVER give you correct results.

Personally If I had a group of records I wanted to store together and I was inserting them at the same time, I would insert them using a stored proc with a table variable so I could insert the set toegther and assign the series ID at that time. If they are not all done at the same time, you would have to check the last inserted record to detemine the seriesID which is problematic due to race conditions (you will also have to consider race conditions if they are all inserted together, they are just less difficult to deal with).

More details about what the data is would help us to know how to help you, a general col_1, col_2 is not helpful at this point. Knowing what kind of data this is might help us understand why you have the requirement and how to suggest ways we have seen this type of data handled.

HLGEM
I totally agree with the database design - however I don't have a choice since the database I am working with was created by an outside vendor. There is nothing I can do about this.
J Cooper
+1  A: 

A very common problem. If you google up "gaps and islands in sql", you will find multiple solutions.

AlexKuznetsov
A: 

I recreated your table and query and I got back:

col1    (No column name)    (No column name)
A   1   9
B   4   7
C   5   6

I noticed your query though had an additional comma after the MAX(col2) part which leads me to believe at this time that there is at least one more column in your 'real' query that is causing the results you put in the question. I think I need more information to be of further help.

DECLARE @SomeTable TABLE (Col1 CHAR(1), Col2 INT)

INSERT INTO @SomeTable(Col1, Col2) VALUES('A', 1)
INSERT INTO @SomeTable(Col1, Col2) VALUES('A', 2)
INSERT INTO @SomeTable(Col1, Col2) VALUES('A', 3)
INSERT INTO @SomeTable(Col1, Col2) VALUES('B', 4)
INSERT INTO @SomeTable(Col1, Col2) VALUES('C', 5)
INSERT INTO @SomeTable(Col1, Col2) VALUES('C', 6)
INSERT INTO @SomeTable(Col1, Col2) VALUES('B', 7)
INSERT INTO @SomeTable(Col1, Col2) VALUES('A', 9)

SELECT col1, 
       MIN(col2), 
       MAX(col2)
FROM @SomeTable 
GROUP BY col1
Mike Cheel
nope, the extra comma was just a typo, I corrected it.
J Cooper
did you run my sql above? what do I need to change in my sql to match yours? And if you have what matches mine then just add an ORDER BY at the end to sort the result.
Mike Cheel
+3  A: 
;WITH T AS(
SELECT Col1, 
       Col2,
       ROW_NUMBER() OVER (ORDER BY Col2) - 
              ROW_NUMBER() OVER (PARTITION BY Col1 ORDER BY Col2) G
FROM YourTable
)
SELECT Col1, MIN(Col2) AS [Min], Max(Col2) AS [Max]
FROM T
GROUP BY Col1,G
ORDER BY [Min]
Martin Smith
+1 - well done, I was thinking of something similar...
Lucero
+1 Also, if there can be duplicates in Col2 (OP doesn't say one say, but implies that there are not duplicates) then this could break. `Rank()` or `dense_rank()` instead of `row_number()` would handle this case and the case with duplicates.
Shannon Severance
+1 very nice - I would like to test this query, i'm not that familiar with window functions
rsenna
Seems to be what I was looking for, It works perfectly on my test table- Now I'm just trying to apply it to my slightly more complicated production scenario.
J Cooper
Thanks a bunch. Got it to work! - at least I think so. Thanks to Shannon Serverance too for the tip, I did end up having to use the Rank() function.
J Cooper