views:

893

answers:

6

As some of you may know, use of the LIMIT keyword in MySQL does not preclude it from reading the preceding records.

For example:

SELECT * FROM my_table LIMIT 10000, 20;

Means that MySQL will still read the first 10000 records and throw them away before producing the 20 we are after.

So, when paginating a large dataset, high page numbers mean long load times.

Does anyone know of any existing pagination class/technique/methodology that can paginate large datasets in a more efficient way i.e. that does not rely on the LIMIT MySQL keyword?

In PHP if possible as that is the weapon of choice at my company.

Cheers.

A: 

I'm not aware of that performance decrease that you've mentioned, and I don't know of any other solution for pagination however a ORDER BY clause might help you reduce the load time.

Alix Axel
A: 

Best way is to define index field in my_table and for every new inserted row you need increment this field. And after all you need to use WHERE YOUR_INDEX_FIELD BETWEEN 10000 AND 10020 It will much faster.

Vitaly Dyatlov
This requires that you never delete any rows, and that you're not displaying aggregates.
nos
A: 

Hi,

A solution might be to not use the limit clause, and use a join instead -- joining on a table used as some kind of sequence.

For more informations, on SO, I found this question / answer, which gives an example -- that might help you ;-)

Pascal MARTIN
Thanks mate I'll take a look!
Evernoob
A: 

There are basically 3 approaches to this, each of which have their own trade-offs:

  1. Send all 10000 records to the client, and handle pagination client-side via Javascript or the like. Obvious benefit is that only a single query is necessary for all of the records; obvious downside is that if the record size is in any way significant, the size of the page sent to the browser will be of proportionate size - and the user might not actually care about the full record set.

  2. Do what you're currently doing, namely SQL LIMIT and grab only the records you need with each request, completely stateless. Benefit in that it only sends the records for the page currently requested, so requests are small, downsides in that a) it requires a server request for each page, and b) it's slower as the number of records/pages increases for later pages in the result, as you mentioned. Using a JOIN or a WHERE clause on a monotonically increasing id field can sometimes help in this regard, specifically if you're requesting results from a static table as opposed to a dynamic query.

  3. Maintain some sort of state object on the server which caches the query results and can be referenced in future requests for a limited period of time. Upside is that it has the best query speed, since the actual query only needs to run once; downside is having to manage/store/cleanup those state objects (especially nasty for high-traffic websites).

Amber
A: 

some other options,

  • Partition the tables per each page so ignore the limit
  • Store the results into a session (a good idea would be to create a hash of that data using md5, then using that cache the session per multiple users)
MANCHUCK
not really a fan of storing such a large dataset into a session
Evernoob
+2  A: 

First of all, if you want to paginate, you absolutely have to have an ORDER BY clause. Then you simply have to use that clause to dig deeper in your data set. For example, consider this:

SELECT * FROM my_table ORDER BY id LIMIT 20

You'll have the first 20 records, let's say their id's are: 5,8,9,...,55,64. Your pagination link to page 2 will look like "list.php?page=2&id=64" and your query will be

SELECT * FROM my_table WHERE id > 64 ORDER BY id LIMIT 20

No offset, only 20 records read. It doesn't allow you to jump arbitrarily to any page, but most of the time people just browse the next/prev page. An index on "id" will improve the performance, even with big OFFSET values.

Josh Davis
The only thing this doesn't take into account is deleted rows (it assumes consistent numbering for the ids)...
Jeffrey Berthiaume
It's a nice idea, but yeah I also need the page numbers, which means I'd have to either a) never delete any rows or b) calculate and store somewhere the ids of each record ending each page.
Evernoob
if you don't get enough rows, just request some more. keep requesting in batches until you get enough, or you get 0 (which means there are no more rows). make sure to keep track of where the actual end is so the next set of rows is contiguous with what was displayed to the user, not what was retrieved.
longneck
That method has nothing to do with deleting rows and it does NOT assume that ids are consecutive, as shown in the example. You don't lose the page number if you pass it in the URL.
Josh Davis