views:

71

answers:

3

I need to be able to store a large list of ordered items in the DB. So far that's straight-forward:

ID Position OtherFields
 1     45      ...
 2   4736      ...
 3    514      ...
 ...

In queries, I always need to get just a few items (filtered based on OtherFields) but in the correct order. Easy as well, putting an index on Position and using "order by Position".

Now the problem: Items change their Position frequently, and not just by 1 or 2. If ID 2 changes the Position from 4736 to 2000, I need to update its Position and the Position of all elements between old Position 2000 and 4735, adding 1 in each row. And it's not only one ID that changes per transaction but a few, and there can be many transactions within a short time.

I think the most elegant way to handle the update problem would be using a linked list instead of a Position column where I can just remove ID 2 from its old position by linking its predecessor to its successor and then insert it elsewhere by linking it between its new predecessor and successor. That would be a constant and small number of updates per Position change and it would also be my preferred way of handling the changes (in Java in my case). However this raises the N+1 problem for querying in the correct order - even for a few elements, I have to go through the whole list in the worst case for finding out their correct order.

So my question is: What would you recommend to get a good balance between necessary updates and query performance?

So far I see two promising directions:

  1. Is there a DBMS (ideally OpenSource) that can handle linked lists not only with syntactic sugar but also with a good performance, e.g. by using internal Indices for the linked elements?

  2. Maybe it would also be an option to just have a BLOB where the whole Linked List would be stored in! How big could such a Linked List get / how much memory would it use in the DB and when fetched for let's say 1.000.000 entries? I'm using Java + Hibernate in case it matters. I imagine that processing even the whole list in memory after fetching the BLOB should be pretty fast!?

But of course other ideas are welcome as well!

+4  A: 

If you relax the constraint that the Position column must contain integers from 1 to N and instead allow it to contain any numbers then you can do both searches and updates efficiently.

You can insert an item between two other items with position A and B by calculating the average (A + B) DIV 2. For example if A is 10000 and B is 12000 then your new position is 11000. Occasionally you will run out of gaps due to clustering, at which point you can run through the whole table redistributing the positions more evenly.

Mark Byers
I'm just throwing this out there. Can Position be a float column? You can always divide a number in half. :)
Marcus Adams
@Marcus Adams: It could, but you still would eventually hit the same issue of not having a gap due to precision and rounding errors.
Mark Byers
@Mark, It was fun to think about anyway.
Marcus Adams
That makes totally sense, I think I'll try that with a BIGINT. Just read that unsigned it goes from 0 to 18.446.744.073.709.551.615. Even starting with a list of 1.000.000 equally distributed entries, this would leave a "space" of approx... 18.446.744.073.709 entries inbetween, giving some time before the first collision. In the worst case (all updates always go into the smallest gap which is always divided in 2), that should be 44 updates before a necessary redistribution, I think (2^44 = 17.592.186.044.416 < 18.446.744.073.709).
werner5471
A: 

The following might be helpful. It does not answer your questions directly, but could shed some light on how to do this (if it is possible for your requirements):

http://stackoverflow.com/questions/2195257/ranking-entries-in-mysql-table

del.ave
A: 

What about using a decimal for position? If you do, you could use the following method to put it between other positions:

Original records are:

ID    Position  Otherfields
--------------------------
1     1.0
2     2.0
.
.
.
5000  5000.0

Then say you move ID 1 to just before 5000

ID    Position  Otherfields
--------------------------
1     4999.9
2     2.0
.
.
.
5000  5000.0

Now lets say you want to put ID 2 between 1 and 5000:

ID    Position  Otherfields
--------------------------
1     4999.9
2     4999.91
.
.
.
5000  5000.0

This way you are only changing one record...

UPDATE:

After re-reading @Mark Byers' suggestion it appears that our solutions are very similar, though using a decimal seems much simpler to me...

Abe Miessler
You've re-invented Dewey Decimal.
Marcus Adams
Heh, yes basically
Abe Miessler