tags:

views:

827

answers:

3

I have the following stored procedure which returns A, B, and the count in descending order. I am trying to use ROW_NUMBER, so I can page the records, but I want the first row number 1 to be the record with the highest count, so basically, if I return a table with 3 records and the count is 30, 20, 10, then row number 1 should correspond with count 30, row number 2 should correspond with count 20, and row number 3 should correspond with count 10. dbo.f_GetCount is a function that returns a count.

CREATE PROCEDURE dbo.Test
AS  
@A nvarchar(300) = NULL,
@B nvarchar(10) = NULL
AS
SELECT @A = NULLIF(@A,''), @B = NULLIF(@B,'')

SELECT h.A, hrl.B,
       dbo.f_GetCount(hrl.A,h.B) as cnt
FROM dbo.hrl
INNER JOIN dbo.h on h.C = hrl.C
WHERE (@A IS NULL OR h.A like '%' + @A + '%') 
  AND (@B IS NULL OR hrl.B = @B)
GROUP BY hrl.B, h.A
ORDER BY cnt desc
A: 
SELECT h.A, hrl.B,
       dbo.f_GetCount(hrl.A,h.B) as cnt,
ROW_NUMBER() over (order by cnt desc) as row_num
FROM dbo.hrl
INNER JOIN dbo.h on h.C = hrl.C
WHERE (@A IS NULL OR h.A like '%' + @A + '%') 
  AND (@B IS NULL OR hrl.B = @B)
GROUP BY hrl.B, h.A
ORDER BY cnt desc

This should do the trick. I don't have SSMS in front of me to test, but you MAY have to substitute the usage of 'cnt' in the ROW_NUMBER's order by clause with a second call to the function, but this should give you the general idea.

Adam Robinson
This gives me the error: Incorrect syntax near the keyword 'over'
Xaisoft
Sorry, should be ROW_NUMBER()
Adam Robinson
I tried that and it gave me the same error.
Xaisoft
+3  A: 
WITH q AS
        (
        SELECT h.A, hrl.B,
              dbo.f_GetCount(hrl.A,h.B) as cnt
        FROM dbo.hrl
        INNER JOIN dbo.h on h.C = hrl.C
        WHERE (@A IS NULL OR h.A like '%' + @A + '%') 
          AND (@B IS NULL OR hrl.B = @B)
        GROUP BY hrl.B, h.A
        )
SELECT  q.*, ROW_NUMBER() OVER (ORDER BY cnt DESC) AS rn
FROM    q
ORDER BY rn DESC

To retrieve first 10 rows, use:

WITH q AS
        (
        SELECT h.A, hrl.B,
              dbo.f_GetCount(hrl.A,h.B) as cnt
        FROM dbo.hrl
        INNER JOIN dbo.h on h.C = hrl.C
        WHERE (@A IS NULL OR h.A like '%' + @A + '%') 
          AND (@B IS NULL OR hrl.B = @B)
        GROUP BY hrl.B, h.A
        )
SELECT  TOP 10 q.*, 
        ROW_NUMBER() OVER (ORDER BY cnt DESC, A, B) AS rn
FROM    q
ORDER BY cnt DESC, A, B

To retrieve rows between 11 and 20, use:

SELECT  *
FROM    (
        WITH q AS
                (
                SELECT h.A, hrl.B,
                      dbo.f_GetCount(hrl.A,h.B) as cnt
                FROM dbo.hrl
                INNER JOIN dbo.h on h.C = hrl.C
                WHERE (@A IS NULL OR h.A like '%' + @A + '%') 
                  AND (@B IS NULL OR hrl.B = @B)
                GROUP BY hrl.B, h.A
                )
        SELECT  q.*, 
                ROW_NUMBER() OVER (ORDER BY cnt DESC, A, B) AS rn
        FROM    q
        ) qq
WHERE rn BETWEEN 11 AND 20
ORDER BY cnt DESC, A, B
Quassnoi
Just curious, what exactly is the WITH for? Does it improve performance?
Xaisoft
In this case, it was simplier to copy and paste using with :) It's useful for CTE's, in other ways it behaves same as a subquery.
Quassnoi
I am going to update my procedure in my question because it was actually giving me an error saying I need to terminate my statement, so maybe you can update your answer when I update my question. Thanks so far.
Xaisoft
Append a semicolon to your previous statement, right before WITH.
Quassnoi
Ok, I ran it with a semicolon after the first select statement and I get two errors: Incorrect Syntax near (, the one after the WITH and Incorrect Syntax near the keyword ORDER
Xaisoft
The error above is the ORDER BY cnt desc
Xaisoft
Sure, my fault. See updated post.
Quassnoi
Ok, I made the change and now I get the error Incorrect Syntax near keyword 'as'. The 'as' where it says ') AS q'
Xaisoft
I changed this also, copy the whole new query.
Quassnoi
What part did you change this time?
Xaisoft
Damn. Try now, it should work :)
Quassnoi
lol, ok thanks, i was looking at it and trying to figure out what you changed.
Xaisoft
+1 I do prefer "with" with CTE.
Sung Meister
Where would I put the where clause? I did the one for Casper's, but it takes 8 seconds to return 10 records.
Xaisoft
@Xaisoft: You can put it either in the CTE expression or on the outer query between "from q" and "order by"
Sung Meister
Yes, I tried 'WHERE rn between 1 and 10', but it says rn is an invalid column name.
Xaisoft
I am actually trying to do what is in the following link so I can speed up paging for my GridView in asp.net:http://www.davidhayden.com/blog/dave/archive/2005/12/30/2652.aspx
Xaisoft
If I do Top 10, it actually returns the records with the lowest count first because your result set returns the records in ascending order, so what I actually need is where rn between 1 and 10, but can't get that to work
Xaisoft
Just checked, my query returns top 10 HIGHEST counts, numbered from 1 to 10 in RN field.
Quassnoi
Yes that is right, it does match rn 1 with the highest count and rn 2 with the second highest, but it is displaying with with the cnt going in ascending order, so it the first row will have a cnt of 1 and an rn of 137
Xaisoft
Any idea how I can implement the where clause?
Xaisoft
Did you put "ORDER BY cnt DESC" instead of "ORDER BY rn"
Quassnoi
Ok that was it, now all that is missing is the WHERE clause? Any ideas?
Xaisoft
WHERE clause for what? To limit by rn? If you didn't forget TOP 10, then the query will return only TOP 10 rows without additional clauses.
Quassnoi
For example, I want to return only the rows between 11 and 20 or between 21 and 30.
Xaisoft
See updated post
Quassnoi
@Quassnoi: Using the TOP syntax is incorrect for two reasons. The first is performance, you might actually cause a second ordering on the result set (in addition to ROW_NUMBER).
casperOne
@Quassnoi: The second is if you have two groups with the same count, the groups are not guaranteed to be returned in the same order between the ROW_NUMBER and ORDER BY operations. You have to have another field to uniquely identify the groups (which is why you should order on ROW_NUMBER)
casperOne
@casperOne: first, SQL Server is smart enough to see that ORDER BY and ROW_NUMBER() use the same sort order and not to sort twice.
Quassnoi
@casperOne: second, my second query does not use TOP at all, it orders the results already returned and filtered by ROW_NUMBER()
Quassnoi
@casperOne: and third, you'd better correct the same things in your own answer before criticizing and downvoting mine.
Quassnoi
@Quassnoi: While SQL Server might be smart enough, that is an implementation detail of the query optimizer, and general programming practice dictates you not rely on implementation details. And yes, you do use TOP, in the second query, to select the next 10 rows.
casperOne
In the third query, I meant
Quassnoi
@Quassnoi: Rather, I meant the first ten rows (in the second query).
casperOne
@Quassnoi: Neither of my recommendations suffers from any of the points I made out about your queries. I do not use the TOP syntax, and I'm always ordering on unique identifiers for the rows.
casperOne
@casperOne: Where exactly do you order on unique identitiers in "order by t.cnt desc"?
Quassnoi
@Quassnoi: Your statement about downvoting is meaningless and adds nothing to the conversation, nor was I out of line to downvote something that I believe to be in error (which is separate from my own ideas on how best address the question).
casperOne
@casperOne: And what makes you sure "ORDER BY t.cnt" will persist between the queries?
Quassnoi
@Quassnoi: The "order by t.cnt desc" is done just once in the context of the query and then assigned a unique number on which future orderings are performed.
casperOne
@Quassnoi: In your examples, you order on the count *twice* which leads to the possibility of incorrect ordering between the row number and the order by in the *same* query.
casperOne
@Quassnoi: Yes, you are right, my query will not maintain the same order between queries, but within the same query, it will, which is what yours are lacking due to the double ordering.
casperOne
@casperOne: you have two pages. To retrieve them, you issue two queries: BETWEEN 1 AND 10 and BETWEEN 11 AND 20. Let's assume that CNT is equal for all rows. What makes you sure the second query will not miss or double anything if you just order by CNT?
Quassnoi
Please show me where exactly there is such a possibility, given that TOP 10 does not rely on ROW_NUMBER() at all (in the 2nd query), and ORDER BY happens only AFTER filtering by ROW_NUMBER() (in the 2rd query)
Quassnoi
@Quassnoi: I've updated my post to show how you can guarantee order across multiple queries on the same set of data. To your first question, it won't miss or double anything, but it will not guarantee the order applied to ROW_NUMBER is the order applied to rows in the ORDER BY statement.
casperOne
@Quassnoi: That's an implementation detail (assuming all counts are equal, how items are ordered between the separate operations). As to your second question, it's the same answer. The way that items are ordered in a ROW_NUMBER operation and ORDER BY are not guaranteed to be the same.
casperOne
@Quassnoi: Because all counts are equal (in your example), the ORDER BY statement is not guaranteed to return those records in the same order as the ROW_NUMBER function. It should be noted that you have updated your answer to address the ordering across multiple queries.
casperOne
Why do you need the same order on ORDER BY and ROW_NUMBER()? For what purpose? If you use TOP, then the ROW_NUMBER() is not used at all. If you filter by ROW_NUMBER(), the the results will be ordered only AFTER they are filtered. Besides, i've updated my post to order on A and B along with CNT.
Quassnoi
@Quassnoi: You've edited your post to reflect that. My comments stemmed from the older version of your queries which suffered from the issues I pointed out. Since you have corrected the issues I've removed the downvote, but they WERE issues (otherwise, why would you have edited it? =)
casperOne
Sure they were. What I initially wanted to say my query was no worse than yours :)
Quassnoi
+2  A: 

I would use a sub-query to get the values of the function into the result, and then the ROW_NUMBER ranking function, like so:

select
    ROW_NUMBER() over (order by t.cnt desc) as RowId, t.*
from
    (
        SELECT
            h.A, hrl.B, dbo.f_GetCount(hrl.A,h.B) as cnt
        FROM
            dbo.hrl
                INNER JOIN dbo.h on h.C = hrl.C
        WHERE 
            (@A IS NULL OR h.A like '%' + @A + '%') AND 
            (@B IS NULL OR hrl.B = @B)
        GROUP BY
            hrl.B, h.A
    ) as t
order by
    1

If you wanted only a certain section of results (say, for paging), then you would need another subquery, and then filter on the row number:

select
    t.*
from
    (
        select
            ROW_NUMBER() over (order by t.cnt desc) as RowId, t.*
        from
            (
                SELECT
                    h.A, hrl.B, dbo.f_GetCount(hrl.A,h.B) as cnt
                FROM
                    dbo.hrl
                        INNER JOIN dbo.h on h.C = hrl.C
                WHERE 
                    (@A IS NULL OR h.A like '%' + @A + '%') AND 
                    (@B IS NULL OR hrl.B = @B)
                GROUP BY
                    hrl.B, h.A
            ) as t
    ) as t
where
    t.RowId between 1 and 10
order by
    t.RowId

Note that in this query, you could put ROW_NUMBER anywhere in the select list, since you are no longer reliant on using the "order by 1" syntax for the order by statement.

There is a subtle issue here when calling this query multiple times. It is not guaranteed that the order in which the records are returned are going to be consistent if the number of items in each group is not unique. In order to address this, you have to change the ROW_NUMBER function to order on the fields that make up the group in the count.

In this case, it would be A and B, resulting in:

select
    t.*
from
    (
        select
            ROW_NUMBER() over (order by t.cnt desc, t.A, t.B) as RowId, t.*
        from
            (
                SELECT
                    h.A, hrl.B, dbo.f_GetCount(hrl.A,h.B) as cnt
                FROM
                    dbo.hrl
                        INNER JOIN dbo.h on h.C = hrl.C
                WHERE 
                    (@A IS NULL OR h.A like '%' + @A + '%') AND 
                    (@B IS NULL OR hrl.B = @B)
                GROUP BY
                    hrl.B, h.A
            ) as t
    ) as t
where
    t.RowId between 1 and 10
order by
    t.RowId

This ends up ordering the results consistently between calls when the count of the items between groups is not unique (assuming the same set of data).

casperOne
why the order by 1?
Xaisoft
So far, yours has worked, but I am still curious about the order by 1 and if I wanted to return rows 1 thru 10, where would I put the where clause?
Xaisoft
@Xiasoft: To order by the first field. It is possible that you will have counts that are the same. In order to get around that, you have to put more order fields in the over (order by...) section. This will affect the row_number, which is the only thing you should order on (hence the 1).
casperOne
So 1 is short t-sql syntax for basically saying order by RowId
Xaisoft
No, it's a shortcut for saying "order by first field in the SELECT list"
Quassnoi
Ok, in this case, isn't the first field, RowId?
Xaisoft
Why are you performing a subselect here...?
Adam Robinson
And unless I've missed something fundamental (entirely possible), "order by 1" is a meaningless statement. It's going to order by a constant value of 1 for every row, meaning that it will not perform any ordering at all.
Adam Robinson
i thought ordering by column index was an obsolete practice...
Sung Meister
@Adam: It is a valid syntax in SQL Server. It means ordering by the column index.
Sung Meister
@Adam Robinson: Subselect is to prevent the function from possibly being called twice, since you would have to put it in the ranking function and on it's own. It's true the optimizer might only make the call once, but that's more of an implementation detail that should not be relied on.
casperOne
@Adam Robinson: You are incorrect about the order by. The "order by 1" as Quassnoi indicates means order by the first field in the SELECT list, in this case, the RowId, but I don't believe I can reference RowId in the order by, since it is a calculated value (I could be wrong here).
casperOne
Caspers procedure works so far, but I am even more confused now.
Xaisoft
I stand corrected, it does indeed to that. Still seems like a rather obtuse practice.
Adam Robinson
@Xaisoft: ROWID is a reserved word in Oracle. For an Oracle user "order by ROWID" means "keep implicit order" as the rows are returned in ROWID order from table scans. This confused me when I answered. In your case, yes, ORDER BY 1 and ORDER BY RowID are synonyms.
Quassnoi
@Xaisoft: See my updated post for how to filter based on the result of ROW_NUMBER.
casperOne
Casper, I added your changes, but it takes a while to return 10 records. Without the where clause, it actually runs faster when returning 137 records.
Xaisoft
@Xiasoft: When using the ROW_NUMBER ranking function, it requires a full processing of the items in the table, since they have to be ordered. You might want to consider using temp tables and indexes on those tables if you have a large set of data you need to order like this.
casperOne
ah, i was afraid of that, i thought temp ROW_NUMBER was gonna help the user get away from temp tables as noted in this post:http://www.davidhayden.com/blog/dave/archive/2005/12/30/2652.aspx
Xaisoft