ansaurus

Question

Data Structure for storing a sorting field to efficiently allow modifications

Answer 1

+1 A:

It seems to me that your real problem is the need to lock a table for the duration of a transaction. I don't immediately see a good way to solve this problem in a single operation, hence the need for locking.

So the question is whether you can do this in a "Django way" as opposed to using straight SQL. Searching "django lock table" turned up some interesting links, including this snippet, there are many others that implement similar behavior.

A straight SQL linked-list style solution can be found in this stack overflow post, it appeared logical and succinct to me, but again it's two operations.

I'm very curious to hear how this turns out and what your final solution is, be sure to keep us updated!

Matt Baker 2009-10-28 23:37:41

The accepted answer on that post is more or less what I was proposing in the first place. I really don't think it's an implementation of the linked list concept though. I agree that locking the table is a key part of my problem, but I'm still really interested in better data structures for this too, since don't know that flat numbering will scale well.

Paul McMillan 2009-10-28 23:42:04

The appropriate locking level is "repeatable read", which prevents data which was retrieved from being modified for the duration of the transaction, without locking the rest of the table.

Paul McMillan 2009-10-28 23:49:31

"Premature optimization is the root of all evil!" ;) It sounds like you've got an upper bound in mind, why not test the flat-number approach with 50,000 entries and see how it scales? That'll help inform your decision, since I'm sure implementing a data structure will carry its own cost/benefit trade-offs.

Matt Baker 2009-10-28 23:51:42

The PostgreSQL docs say that it only provides 2 levels of actual isolation, so serializable seems to be the only option: http://www.postgresql.org/docs/current/static/transaction-iso.html

Paul McMillan 2009-10-28 23:56:59

Answer 2

+1 A:

You can solve the renumbering issue by doing the order column as an integer that is always an even number. When you are moving the data, you change the order field to the new sort value + 1 and then do a quick update to convert all the odd order fields to even:

update table set sort_order = bitand(sort_order, '0xFFFFFFFE')
where sort_order <> bitand(sort_order, '0xFFFFFFFE')

Thus you can keep the uniqueness of sort_order as a constraint

EDIT: Okay, looking at the question again, I've started a new answer.

jmucchiello 2009-10-31 22:12:44

This is a pretty workable solution. Any comments on the performance of this two-pass even/odd process vs. just allowing the fields to be non-unique and locking the rows during the transaction?

Paul McMillan 2009-11-02 01:16:31

There are too many variables: DBMS, index type, number of rows in table, % of rows modified, other updates within the same transaction, etc. You would need to profile it with good sample data. The most important step is have a DBMS that can do the update without doing a table scan. Some DBMS have a hard time using indexes when you apply functions to the indexed column.

jmucchiello 2009-11-02 02:44:28

Firstly, this solution doesn't account for the gap caused by moving the item from its old position.Secondly, Any solution using a simple sort-order column will result in multiple writes on reordering. Using this two-pass mechanism you will ALWAYS have a number of writes AT LEAST equal to that of the number of records in your scope, as well as modification of the index for those records, which will certainly affect database performanceFinally, you are still going to need to lock the table to make the operation atomic - there is no benefit over your original solution.

Matt 2009-11-02 11:29:34

If your primary concern is your unique constraint, you could always create the space before repositioning of the target. Ive updated my answer to account for this solution.

Matt 2009-11-02 11:30:17

Matt, no gaps are needed. In the sample, 4 entries are selected and their sort order is rotated one position up with the top entry becoming the last entry. The trick is updating the 4 sort order fields simultaneously. See my second answer with the temporary table for a method to do that.

jmucchiello 2009-11-03 19:27:23

Answer 3

+1 A:

Why not do a simple character field of some length like a max of 16 (or 255) initially.

Start initially with labeling things aaa through zzz (that should be 17576 entries). (You could also add in 0-9, and the uppercase letters and symbols for an optimization.)

As items are added, they can go to the end up to the maximum you allow for the additional 'end times' (zzza, zzzaa, zzzaaa, zzzaab, zzzaac, zzzaad, etc.)

This should be reasonable simple to program, and it's very similar to the Dewey Decimal system.

Yes, you will need to rebalance it occasionally, but that should be a simple operaion. The simplest approach is two passes, pass 1 would be to set the new ordering tag to '0' (or any character earlier than the first character) followed by the new tag of the appropriate length, and step 2 would be to remove the '0 from the front.

Obviuosly, you could do the same thing with floats, and rebalancing it regularly, this is just a variation on that. The one advantage is that most databases will allow you to set a ridiculously large maximum size for the character field, large enough to make it very, very, very unlikely that you would run out of digits to do the ordering, and also make it unlikely that you would ever have to modify the schema, while not wasting a lot of space.

Ralph 2009-10-31 22:14:31

Answer 4

+4 A:

Prefered solutions:

A linked list would be the usual way to achieve this. A query to return the items in order is trivial in Oracle, but Im not sure how you would do it in PostreSQL.

Another option would be to implement this using the ltree module for postgresql.

Less graceful (and write-heavy) solution: Start transaction. "select for update" within scope for row level locks. Move the target record to position 0, update the targets future succeeding records to +1 where their position is higher than the targets original position (or vice versa) and then update the target to the new position - a single additional write over that needed without a unique constraint. Commit :D

Simple (yet still write-heavy) solution if you can wait for Postgresql 8.5 (Alpha is available) :)

Wrap it in a transaction, select for update in scope, and use a deferred constraint (postgresql 8.5 has support for deferred unique constraints like Oracle).

Matt 2009-10-31 22:25:25

ltree module in postgres is an interesting suggestion. I'll go take a look at that.

Paul McMillan 2009-10-31 22:33:28

Also interesting that ltree supports b-tree indexing out of the box.

Paul McMillan 2009-10-31 22:36:17

Locking the entire table is quite undesirable because the system is intended to support many simultaneous updates.

Paul McMillan 2009-11-02 20:46:27

Updated solution to remove full table lock and implement row level locks. Also added potential solution if you can wait for 8.5. Give me your bounty damnit ;)

Matt 2009-11-03 02:54:26

The answer keeps improving as the bounty is open! The deferred constraint idea is a good one.

Paul McMillan 2009-11-03 19:13:02

I'll keep fighting for it ;)

Matt 2009-11-03 19:52:21

Answer 5

+3 A:

A temp table and a transaction should maintain atomicity and the unique constraint on sort order. Restating the problem, you want to go from:

A  10   to  B  10
B  25       C  25
C  26       E  26
E  34       A  34

Where there can be any number of items in between each row. So, first you read in the records and create a list [['A',10],['B',25],['C',26],['E',34]]. Through some pythonic magic you shift the identifiers around and insert them into a temp table:

create temporary table reorder (
    id varchar(20), -- whatever
    sort_order number,
    primary key (id));

Now for the update:

update table XYZ
set sort_order = (select sort_order from reorder where xyz.id = reorder.id)
where id in (select id from reorder)

I'm only assuming pgsql can handle that query. If it can, it will be atomic.

Optionally, create table REORDER as a permanent table and the transaction will ensure that attempts to reorder the same record twice will be serialized.

EDIT: There are some transaction issues. You might need to implement both of my ideas. If two processes both want to update item B (for example) there can be issues. So, assume all order values are even:

Begin Transaction
Increment all the orders being used by 1. This puts row level write locks on all the rows you are going to update.
Select the data you just updated, if any sort_order fields are even some other process has added a record that matches your criteria. You can either abort the transaction and restart or you can just drop the record and finish the operation using only the records that were updated in step 2. The "right" thing to do depends on what you need this code to accomplish.
Fill your temporary reorder table as above using the proper even sort_orders.
Update the main table as above.
Drop the temporary table.
Commit the transaction

Step 2 ensures that if two lists overlap, only the first one will have access to the row in question until the transaction completes:

update XYZ set sort_order = sort_order + 1
where -- whatever your select criteria are

select * from XYZ
where -- same select criteria
order by sort_order

Alternatively, you can add a control field to the table to get the same affect and then you don't need to play with the sort_order field. The benefit of using the sort_order field is indexing by a BIT field or a LOCK_BY_USERID field when the field is usually null tends to have poor performance since the index 99% of the time is meaningless. SQL engines don't like indexes that spend most of their time empty.

jmucchiello 2009-11-02 14:44:40

ansaurus

tags:

views:

answers:

Data Structure for storing a sorting field to efficiently allow modifications

related questions