tags:

views:

104

answers:

2

I'm writing a web application that should show very large results on a search query. Say some queries will return 10.000 items. I'd like to show those to users paginated; no problem so far: each page will be the result of a query with an appropriate LIMIT statement. But I'd like to show clues about results in each page of the paginated query: some data from the first item and some from the last. This mean that, for example, with a result of 10.000 items and a page size of 50 items, if the user asked for the first page I will need:

  • the first 50 items (the page requested by the user)
  • item 51 and 100 (the first and last of the second page)
  • item 101 and 151

etc

For efficiency reasons I want to avoid one query per row.

[edit] I also would prefer not downloading 10.000 results if I only need 50 + 10000/50*2 = 400

The question is: is there a single query I can issue to the RDBMS (mysql, by the way, but I'd prefer a cross-db solution) that will return only the data I need?

I can't use server side cursor, because not all dbs support it and I want my app to be database-agnostic.

A: 

UPDATE: I completely misread the initial question. You can do this using UNION and the LIMIT clause in MySQL, although it might be what you meant by "one query per row". The syntax would be like:

select FOO from BAZ limit 50
    union
select FOO from BAZ limit 50, 1
    union
select FOO from BAZ limit 99, 1
    union
select FOO from BAZ limit 100, 1
    union
select FOO from BAZ limit 149, 1

and so on and so forth. Since you're using UNION, you'll only need one roundtrip to the database. I'm not sure how MySQL will treat the various SELECT statements, though. It should be able to recognize that they are essentially the same query and use a cached query plan, but I don't work with MySQL enough to know if that's a reasonable expectation for its optimizer.

Obviously, to build this query in a general fashion, you'll first need to run a count query so you can calculate what your offsets will be.

This is definitely not a tractable problem for standard SQL, since the paging logic requires nonstandard features.

Hank Gay
Sorry—misread your question (morning coffee still brewing). I will adjust my answer.
Hank Gay
Good point: this would minimize the dependency on the db used: only the limit syntax would have to be adjusted.Paging logic requires nonstandard features.Each time I stumble upon this I am really suprised about it: how could such a basic construct not be standardized?
silviot
The problem here is that the query cannot be parameterized. What if I want this for 100 pages? Or if I want the 100th page along with the previous and next one? Then you have to create the SQL on-the-fly.
Dercsár
@Dercsár So far as I know, there really isn't a way to avoid building the query on the fly short of processing ALL the rows in your business logic and plucking out the ones you care about.
Hank Gay
@silviot At this point, I'm not sure what good adding it to the standard would be; most implementations lag behind the standard by years. Just imagine trying to write this in Oracle: it doesn't even support limit, you have to use a nested query and `rownum` hackery.
Hank Gay
+2  A: 

Just for fun, here is the MSSQL version of it.

declare @pageSize as int; set @pageSize = 10;  
declare @pageIndex as int; set @pageIndex = 0; /* first page */  
WITH x AS  
(  
    select  
        ROW_NUMBER() OVER (ORDER BY (created) ASC) AS RowNumber,  
        *  
    from table  
)  
SELECT * FROM x  
WHERE  
    ((RowNumber <= (@pageIndex+1)*@pageSize) AND (RowNumber >= @pageIndex*@PageSize+1))  
    OR  
    RowNumber % @pageSize = 1  
    OR  
    RowNumber % @pageSize = @pageSize-1 

Note, that an ORDER BY is provided in the over clause.
Also note, that if you have gazillion rows, your result set will have millions. You need to maximize the result rows for practical reasons.

I have no idea how this could be a solved in generic SQL. (My bet: no way. Even simple pageing cannot be solved without DB-specific operators.)

Dercsár
I should investigate the efficiency of this solution compared to the (simpler) one provided by Hank Gay. Thanks.
silviot