views:

334

answers:

1

I'm using ASP.NET MVC and Azure Table Storage in the local development fabric. My pagination code is very slow when working with a large resultset:

var PageSize = 25;
var qResult2 = from c in svc.CreateQuery<SampleEntity>(sampleTableName)
                           where c.PartitionKey == "samplestring"
                           select c;
TableStorageDataServiceQuery<SampleEntity> tableStorageQuery = 
                 new TableStorageDataServiceQuery<SampleEntity>
                 (qResult2 as DataServiceQuery<SampleEntity>);
var result = tableStorageQuery.ExecuteAllWithRetries()
                                .Skip((page - 1) * PageSize)
                                .Take(PageSize);
var numberOfEntities = tableStorageQuery.ExecuteAllWithRetries().Count
ViewData["TotalPages"] = (int)Math.Ceiling((double) numberOfEntities / PageSize);
ViewData["CurrentPage"] = page;
return View(result);

The ViewData is used by the View to calculate paging links using code from Sanderson's MVC book. For an Azure Table with 1000+ entities, this is very slow. For starters, "Count" takes quite a long time to calculate the total number of entities. If I'm reading my LINQ book correctly, this is because the query doesn't implement ICollection. The book is "Pro LINQ" by Joseph Rattz.

Even if I set "numberOfEntities" to the known total (e.g. 1500), the paging is still slow for pages above 10. I'm guessing that .Skip and/or .Take are slow. Also, I call ExecuteAllWithRetries() twice, and that can't be helping if in fact Azure is queried twice.

What strategy should I follow for paging through large datasets with ASP.NET MVC and Azure?

EDIT: I don't need to know the exact total number of pages.

+2  A: 

Skip and Take aren't the problem here - they will be executed against the IEnumerable, which will already be in memory and thus very quick.

ExecuteAllWithRetries is likely to be the culprit here - you're basically retrieving all of the entities in the partition from the remote storage in this call, which will result in a very large payload.

Pagination in the manner you're showing is quite difficult in Table Storage. Here are a few issues:

  • The only order that's guaranteed is the PartitionKey/RowKey order, so you need to design your RowKeys with this in mind.

  • You can perform the Take in the query (ie, your qResult2), so this will reduce the number of entities going over the wire.

  • To perform the Skip-like functionality, you'll need to use a comparison operator. So you'll need to know where you are in the result set and query all RowKeys above that value (ie, add something like where c.RowKey > [lastRowKey] to your query)

  • There's no way to retrieve a count without keeping track of it yourself (or retrieving the entire table like you're already doing). Depending on your design, you could store the count along with each entity (ie, use an incrementing value) - but just make sure you keep track of concurrent edit conflicts, etc. If you do keep track of the count with each entity, then you can also perform your Skip using this as well. Another option would be to store the count in a single value in another entity (you could use the same table to ensure transactional behaviour). You could actually combine these approaches too (store the count in a single entity, to get the optimistic concurrency, and also store it in each entity so you know where it lies).

  • An alternative would be, if possible, to get rid of the count altogether. You'll notice a couple of large scalable sites do this - they don't provide an exact list of how many pages there are, but they might let you go a couple of pages ahead/back. This basically eliminates the need for count - you just need to keep track of the RowKeys for the next/prev pages.

Michael Hart
Thanks for your comments. You convinced me that I was going down the wrong path. I've now implemented paging as outlined on http://blog.smarx.com/posts/paging-over-data-in-windows-azure-tables and I'm happy enough. For my app, I don't need to display the total number of pages, so I got rid of "count" as you suggested. Dealing with continuation tokens was easier than I thought, and they handily take care of the "Skip" and "Take" functionality I had before. Thanks again.
Slack