Here's the problem I'm trying to solve:
I need to be able to display a paged, sorted table of data that is stored across several database shards.
Paging and sorting are well known problems that most of us can solve in any number of ways when the data comes from a single source. But if you're splitting your data across shards or using a DHT or distributed document database or whatever flavor of NoSQL you prefer, things get more complicated.
Here's a simple picture of a really small data set:
Shard | Data
1 | A
1 | D
1 | G
2 | B
2 | E
2 | H
3 | C
3 | F
3 | I
Sorted into pages (Page Size = 3):
Page | Data
1 | A
1 | B
1 | C
2 | D
2 | E
2 | F
3 | G
3 | H
3 | I
And if we wanted to show the user page 2, we'd return:
D
E
F
If the size of the table in question is something like 10 million rows, or 100 million, you can't just pull down all the data onto a web/application server to sort it and return the correct page. And you obviously can't let each individual shard sort and page its own slice of the data because the shards don't know about each other.
To complicate matters, the data I need to present can't be too far out of date, so pre-calculating a set of useful sorts ahead of time and storing the results for later retrieval isn't practical.