views:

480

answers:

7

It's one of those things that seems to have an odd curve where the more I think about it, the more it makes sense. To a certain extent, of course. And then it doesn't make sense to me at all.

Care to enlighten me?

A: 

It isn't.. is it?

SCdF
+17  A: 

Because in most cases you've got to sort your results first. For example, when you search on Google, you can view only up to 100 pages of results. They don't bother sorting by page-rank beyond 1000 websites for given keyword (or combination of keywords).

Pagination is fast. Sorting is slow.

lubos hasko
+2  A: 

This is a really vague question. We'd need a concrete example to get a better idea of the problem.

Ryan Doherty
If you look at the title the question makes sense, when you read the question itself it stops making sense.
James McMahon
+3  A: 

Lubos is right, the problem is not the fact that you are paging (which takes a HUGE amount of data off the wire), but that you need to figure out what is actually going on the page..

The fact that you need to page implies there is a lot of data. A lot of data takes a long time to sort :)

Rob Cooper
+1  A: 

I thought you meant pagination of the printed page - that's where I cut my teeth. I was going to enter a great monologue about collecting all the content for the page, positioning (a vast number of rules here, constrait engines are quite helpful) and justification... but apparently you were talking about the process of organizing information on webpages.

For that, I'd guess database hits. Disk access is slow. Once you've got it in memory, sorting is cheap.

Josh
A: 

Of course sorting on a random query takes some time, but if you're having problems with the same paginated query being used regulary, there's either something wrong with the database setup (improperly indexing/none at all, too little memory etc. I'm not a db-manager) or you're doing pagination seriously wrong:

Terribly wrong: e.g. doing select * from hugetable where somecondition; into an array getting the page count with the array.length pick the relevant indexes and dicard the array - then repeating this for each page... That's what I call seriously wrong.

The better solution two queries: one getting just the count then another getting results using limit and offset. (Some proprietary, nonstandard-sql server might have a one query option, I dunno)

The bad solution might actually work quite okay in on small tables (in fact it's not unthinkable that it's faster on very small tables, because the overhead of making two queries is bigger than getting all rows in one query. I'm not saying it is so...) but as soon as the database begins to grow the problems become obvious.

Stein G. Strindhaug
The combination of LIMIT with a large offset, and an ORDER BY or GROUP BY, can still be very resource intensive, which is why Google doesn't get a full count (anything over 1000 results and it's an 'estimate') nor paginate beyond the first 1000 results.
thomasrutter
+2  A: 

This question seems pretty well covered, but I'll add a little something MySQL specific as it catches out a lot of people:

Avoid using SQL_CALC_FOUND_ROWS. Unless the dataset is trivial, counting matches and retrieving x amount of matches in two separate queries is going to be a lot quicker. (If it is trivial, you'll barely notice a difference either way.)

JoeBloggs
A casual browse through SO after dinner, an intriguing tip, a 10 minute test and then a 10 minute tweak, et voila, my database load is halved on my heaviest site! Thanks!
jTresidder
It's a good tip. Doing the count in another query can count without fetching row data and may use just indexes. However, does it work as well in InnoDb as it does in MyIsam? I have a funny feeling it's different but could be wrong.
thomasrutter