views:

115

answers:

3

Intro

Consider you have a list of key/value pairs:

(0,a) (1,b) (2,c)

You have a function, that inserts a new value between two current pairs, and you need to give it a key that keeps the order:

(0,a) (0.5,z) (1,b) (2,c)

Here the new key was chosen as the average between the average of keys of the bounding pairs.

The problem is, that you list may have milions of inserts. If these inserts are all put close to each other, you may end up with keys such to 2^(-1000000), which are not easily storagable in any standard nor special number class.

The problem

How can you design a system for generating keys that:

  • Gives the correct result (larger/smaller than) when compared to all the rest of the keys.
  • Takes up only O(logn) memory (where n is the number of items in the list).

My tries

  • First I tried different number classes. Like fractions and even polynomium, but I could always find examples where the key size would grow linear with the number of inserts.
  • Then I thought about saving pointers to a number of other keys, and saving the lower/greater than relationship, but that would always require at least O(sqrt) memory and time for comparison.

Extra info: Ideally the algorithm shouldn't break when pairs are deleted from the list.

A: 

Perhaps you can use the Farey Sequence? Each key will be a number in reduced form a/b.

If you insert between a/b and a1/b1, the key value you give the new one is (a+a1)/(b+b1).

So you store the ordered pair (a,b) in reduced form (i.e. find the gcd and divide out: Euclid's GCD algorithm).

Initally you start out with 0 and 1. Since the the nth farey sequence has a/b where both a <= n and b <= n, your O(log n) space requirement will likely also hold.

If a/b > c/d (where a,b,c,d are non-negative integers) then we have that a/b > (a+b)/(c+d) > c/d, so it seems like this should work, even with arbitrary deletions.

Related: Stern-Brocot Tree.

Hope it helps.

Moron
With (a+a1)/(b+b1) worst case would still be a doubling denominator every seccond insert? But definitely an interesting idea.
Thomas Ahle
You take the gcd, but yes, it could potentially happen. The O(logn) need not hold in that case, you are right.
Moron
Actually it seams to worstcase in the Fibonacci numbers. So something like `O(1.6^n)`, which is slightly better than the `O(2^n)` from averages.
Thomas Ahle
Yes that seems right. I was more concerned about the average, though: the total number of bits used. It would be interesting to figure that out on a random insertion sequence, haven't thought about it, though.
Moron
+2  A: 

I agree with snowlord. A tree would be ideal in this case. A red-black tree would prevent things from getting unbalanced. If you really need keys, though, I'm pretty sure you can't do better than using the average of the keys on either side of the value you need to insert. That will increase your key length by 1 bit each time. What I recommend is renormalizing the keys periodically. Every x inserts, or whenever you detect keys being generated too close together, renumber everything from 1 to n.

Edit: You don't need to compare keys if you're inserting by position instead of key. The compare function for the red-black tree would just use the order in the conceptual list, which lines up with in-order in the tree. If you're inserting in position 4 in the list, insert a node at position 4 in the tree (using in-ordering). If you're inserting after a certain node (such as "a"), it's the same. You might have to use your own implementation if whatever language/library you're using requires a key.

Adam Crume
Err... How? Care to explain? What is the compare function used to insert in the red-black tree?
Moron
@Moron: You could read up on wikipedia if you'd like to know more about red-black trees, but one interesting quote is 'it can search, insert, and delete in O(log n) time'.Red-Black Trees: http://en.wikipedia.org/wiki/Red-black_tree
Peter Jaric
@snowlord! I know what red-black tree is. The point is, for using a red black tree you _need_ to have a compare function already defined which can compare _any two_ given elements, to decide which branch of the tree to take when inserting a new element. Seems like you had something totally different in mind. -1 till you edit your post to make it clear how you plan to use a red-black tree. Oh sorry, it is not snowlord's answer, but still the vote stands.
Moron
I think a downvote is a little harsh just because you don't think my answer is clear enough. Shouldn't that be reserved for spam, abuse, or abject stupidity?
Adam Crume
So you think I should keep both a tree and a list, so the tree can check the list for the order? But then inserting or looking up in the list, would take a long time..
Thomas Ahle
@Adam: Easy... For spam/abuse the post gets flagged and deleted. As to the answer, you have removed the need for the key, which OP never said you could. Anyway, since your edited answer might well be useful to the OP, I have removed the downvote. It is nothing personal and I suggest your read the FAQ about how the voting system is supposed to work. Also, I suggest you lookup order statistic tree ('using in-ordering' is still a bit unclear).
Moron
@Thomas: I'm not suggesting keeping a list also. The list is implicit in the tree.@Moron: I only suggested that keys might not be necessary, but did not assume that, hence "If you really need keys..." Besides, sometimes the answer is that the question makes an unnecessary assumption. Also, I don't see how in-order is unclear. It's standard terminology.
Adam Crume
@Adam: If you are inserting by rank, it is obvious you are inserting 'in-order'. What extra information have you added by mentioning in-order? Question is how you actually do the inserts... This is quite different from the 'standard' way of inserting elements into a tree, isn't it? That is the clarity that was being sought. Anyway, just FYI: order statistic trees maintain a count of the left descendants and right descendants and that makes inserting by rank (or 'by in-order') O(logn) time. Anyway, I guess I have made my point clear.
Moron
+1  A: 

I don't think you can avoid getting size O(n) keys without reassigning the key during operation.

As a practical solution I would build an inverted search tree, with pointers from the children to the parents, where each pointer is marked whether it is coming from a left or right child. To compare two elements you need to find the closest common ancestor, where the path to the elements diverges.

Reassigning keys is then rebalancing of the tree, you can do that by some rotation that doesn't change the order.

starblue
Ah right. The keys are actually designed to lock the order in case of rebalancing. Maybe instead, I should focus on a smarter rotation scheme.
Thomas Ahle