views:

405

answers:

1

I'm storing an ordered list of several million items in a MySQL database. Reasonably often, items need to be added or removed from the list; equally often, the position within the list of an item must be determined. I'd say the read/write ratio is about 50:50.

Starting with a linked-list model, I read [1] and the various models discussed there. For a strict linked-list, the adjacency list model would work just fine, but since the read/write ratio is more or less equal, I went for a divide and conquer approach using standard contiguous lists:

Divide the entire list into 'buckets' of approximate length (say ~10000), maintaining an index of bucket sizes and their relative position within the main list. Each item is assigned to a specific bucket and keeps track of its position within that bucket.

With this approach, an item's position is determined by summing the sizes of the buckets that preceed the item's bucket in the list, then adding the item's position within its own bucket. To insert/remove an item from the list, the 'shifting' of items that results is localized to the bucket into which an item is being added or removed; the size of that bucket must also be updated accordingly.

There is some denormalization in this approach (the bucket sizes), and it is not inherently thread-safe, even with transactions, because during a remove/insert the table of items must be queried to determine the bucket position of the item being modified, and then updated to perform the 'shift' on all the other items in that item's bucket. Unless these actions are atomic (via stored procedure maybe?) threads consistently deadlock.

Are there any more approriate approaches to keeping this kind of data in an RDBMS? The thread-safety issue is causing me a big headache and it feels like there ought to be a better way to solve this problem than forcing me into using stored procedures.

Many thanks, Matt.

[1] http://stackoverflow.com/questions/935098/database-structure-for-tree-data-structure

+1  A: 

If you need a linked list (not a hierarchy), you can just use the approach described in this article in my blog:

, with this simple query:

SELECT  @r AS _parent,
        @r := (
        SELECT  id
        FROM    t_list
        WHERE   parent = _parent
        ) AS id
FROM    (
        SELECT  @r := 0
        ) vars,
        t_list

Make sure your id and parent have UNIQUE indexes defined for this to be efficient.

Replace @r := 0 with @r := @id_of_record_to_start_with to start browsing from any given id.

To find out the item's position, just reverse the query:

SELECT  COUNT(*)
FROM    (
        SELECT  @r AS _id,
                @r := (
                SELECT  parent
                FROM    t_list
                WHERE   id = _id
                ) AS id
        FROM    (
                SELECT  @r := @item_id
                ) vars,
                t_list
        ) q
Quassnoi