views:

111

answers:

6

Well this is going to sound like a lame question, I know. This is probably one of the most obvious things to do, but I've been trying to come up with a good way of sorting entries in database.

Image table looking like this:

entry_id   |   entry_data
-------------------------
1          | data1
2          | data2
...        | ...
n          | datan

entry_id is of course Primary Key with AI option so all entries have a unique id. Now. I want to order that data. Say I want entry_id = 2 to be first without changing its entry_id. Then the table needs another column to store order number. I tried 2 approaches.

  1. entry_order_no: this will basically keep order number of the items. Any new item will be always added to the end.

  2. left_id + right_id: this approach is used by phpBB. Not sure why it requires right_id though?

I seem to prefer the 1st approach, but how do you sort the data later. Let's assume I want to newly added item move from the last position, to say 2nd. Way I've been doing it I was using indexed tables, where index was order no. Each index contained associative array, that had entry_id. That way I "simply" did foreach loop with an UPDATE on all entries from 2nd place down. That may work for few rows. Maybe even tens. But what when you have hundreds? This seems highly inefficient then. Second one seems a bit better. But something still tells me there's a better approach.

Please do suggest.

A: 

Back when we programmed in primeval Basic we numbered lines "10, 20, 30, ..." so if we ever needed to insert a line between existing ones we could just number it, say, 25, without renumbering all others. You could use a similar trick for your "entry order", if its sole purpose is to keep the rows' order and yet let you change that order with a minimum of fuss, without renumbering everything (mysql internally will have to update the index on that column, but that's still faster than doing it yourself, more likely than not -- try some benchmarks on your setup to confirm this). Use BIGINT and a large initial increment, say 1024, and you should be fine for quite a few reorderings before you need to rebuild that column.

Alex Martelli
A: 

An interesting question. The left_id/right_id is probably an implementation of the nested set approach to storing hierarchical (as opposed to sequential) data. See Trees in SQL.

There are two approaches I have taken for this problem. One is the brute force one you are describing, where you adjust all of the numbers as necessary whenever an entry is added/changed/deleted. The other is to just maintain a rank, so when an entry's rank is elevated, you just increment its rank number without modifying any of the others entries' ranks. Each time the user clicks the up arrow, for example, the entry's rank would get 1 added to it, and you re-render the list sorted by rank, in which case the node moves up one level in the list.

Neither of these approaches are ideal once you get a large set of data, but typically at that point you are no longer manually maintaining the sort order, and have devised an algorithm to sort automatically.

RedFilter
I too though of Celko when I saw this question. IIRC from reading his book on trees and hierarchies he favoured using widely-spaces values so that a new value could be inserted half-way between the two, like Alex Martelli describes in this thread.
onedaywhen
+1  A: 

phpBB is using a smarter approach: the left_id and right_id correspond to nodes of a tree as part of a nested set. This is probably the route you want to take if performance is going to be a problem (since you seem to be worrying about that). Here's a very thorough implementation walkthrough of nested sets.

But note that databases are usually meant for unordered data. To get sorted data, you typically retrieve all rows you are interested in and then do post-processing -- for example, with an ORDER BY clause or by sorting the results once they've come back.

That said, where you need to store an ordering as part of the data itself (for example, because it can't be computed or is based on user preference), this is typically done with another table that holds the ordering or by additional columns in a nested-set approach, as noted above.

John Feminella
+2  A: 

I'm not sure the true nature of why you're trying to do this, so it's kind of hard to say, but as far as your inefficient foreach loop, stop using iterative methods on sets - use set operations, they're way faster and exactly what a DB was meant for. so instead of...

for row in db>2
 entryorderno += 1

do this...

UPDATE Table
SET entryorderno = entryorderno + 1
WHERE entryorderno > 2

Also, pulling from what Alex Martelli said, doing that should be pretty ok for awhile, but eventually you'll have to restack everything with new spacing....but a clustered index on that field would keep things in order by that ID....of course it also means inserts to the middle of the table could be resource intensive if it's a large table.

Jody
Well your suggestion seems so obvious now. I knew there must be some simple solution to this.
Michal M
A: 

If you require an artifical ordering (an ordering you cannot create dynamicaly by sorting on some columns), you have to add a Position column. When you then reorder two rows, you have to change all rows between them if you use a dense coding.

You have the option to use a lose coding - for example the first row gets position 100, the second 200, the third 300, and so on. This will increase the complexity of your logic but allow you to perform a number of reordering operations without the need to modify more than one row.

Daniel Brückner
A: 

How about storing a simple sort order column and using triggers and stored fucntions in your database?

The trigger would fire whenever you update an order column and increment all orders greater than the new order by 1. Hundreds or thousands of records should not be a problem for any decent database, especially when the processing is done internally to the DB, (I would not recommend doing this from within your application) though once you get into the tens of millions, updating that many rows would get troublesome.

Depends on the size of your data.

garrow