ansaurus

Question

Sorting a binary search tree on different key value

Answer 1

+1 A:

Using a HashTable (Java) or Dictionary (.NET) or equivalent data structure in your language of choice (hash_set or hash_map in STL) will give you O(1) inserts during the counting phase, unlike the binary search tree which would be somewhere from O(log n) to O(n) on insert depending on whether it balances itself. If performance is really that important just make sure you try to initialize your HashTable to a large enough size that it won't need to resize itself dynamically, which can be expensive.

As for listing by frequency, I can't immediately think of a tricky way to do that without involving a sort, which would be O(n log n).

Phil 2009-08-13 13:39:31

You could maintain 2 data structures in paralel: one hastable to have quick lookup, and a map/sorted array to have your data well sorted. But every add/remove operation will need you to launch them on each data structure in 2 separate threads if you want to keep the best of the 2 world. Depending on Neeraj needs, it could be harder to maintain.

yves Baumes 2009-08-13 13:48:14

Answer 2

+1 A:

Map, BST are good if you need to have sorted output for your dictionnary.

And it is good if you need to mix up add, remove and lookup operations. I don't think this is your need here. You load the dictionnary, sort it, then do only look up in it, that's right ? In this case a sorted array is probably a better container. (See Item 23 from Effective STL from Scott Meyer).
(Update: simply consider that a map could generate more memory cache misses than a sorted array, as an array get its data contiguous in memory, and as each node in a map contain 2 pointers to other nodes in the map. When your objects are simple and take not much space in memory, a sorted vector is probable a better option. I warmly recommand you to read that item from Meyer's book)

About the kind of sort you are talking about, you will need that algorithm from the stl: stable_sort. The idea is to sort the dictionnary, then sort with stable_sort() on the frequence key.

It will give something like that (not tested actually, but you got the idea):

struct Node
{
char * word;
int key;
};

bool operator < (const Node& l, const Node& r)
{
    return std::string(l.word) < std::string(r.word));
}

bool freq_comp(const Node& l, const Node& r)
{
    return l.key < r.key;
}

std::vector<node> my_vector;
... // loading elements
sort(vector.begin(), vector.end());
stable_sort(vector.begin(), vector.end(), freq_comp);

yves Baumes 2009-08-13 13:43:34

Answer 3

A:

One approach you could consider is to build two trees. One indexed by word, one indexed by freq.

As long as the tree nodes contain a pointer to the data node, you could access if via the word-based tree to update the info, but later access it by the freq-based tree to output.

Although, if speed is really that important, I'd be looking to get rid of the string as a key. String comparisons are notoriously slow.

If speed is not important, I think your best bet is to gather the data based on word and re-sort based on freq as yves has suggested.

MrAleGuy 2009-08-13 13:55:37

You can even still have the data live in the nodes, if your nodes are simultaneously members of both trees (`struct node { data; struct node *word_left, *word_right, *freq_left, *freq_right }`)

caf 2009-08-14 07:04:33

Answer 4

+1 A:

Here is my suggestion for re-balancing the tree based off of the new keys (well, I have 2 suggestions).

The first and more direct one is to somehow adapt Heapsort's "bubble-up" function (to use Sedgewick's name for it). Here is a link to wikipedia, there they call it "sift-up". It is not designed for an entirely-unbalanced tree (which is what you'd need), but I believe it demonstrates the basic flow of an in-place reordering of a tree. It may be a bit hard to follow because the tree is in fact stored in array rather than a tree (though the logic in a sense treats it as a tree) --- perhaps, though, you'll find such an array-based representation is best! Who knows.

The more crazy-out-there suggestion of mine is to use a splay tree. I think they're nifty, and here's the wiki link. Basically, whichever element you access is "bubbled up" to the top, but it maintains the BST invariants. So you maintain the original Key1 for building the initial tree, but hopefully most of the "higher-frequency" values will also be near the top. This may not be enough (as all it will mean is that higher-frequency words will be "near" the top of the tree, not necessarily ordered in any fashion), but if you do happen to have or find or make a tree-balancing algorithm, it may run a lot faster on such a splay tree.

Hope this helps! And thank you for an interesting riddle, this sounds like a good Haskell project to me..... :)

Agor 2009-08-13 16:32:37

Answer 5

+1 A:

You can easily do this in O(1) space, but not in O(1) time ;-)

Even though re-arranging a whole tree recursively until it is sorted again seems possible, it is probably not very fast - it may be O(n) at best, probably worse in practice. So you might get a better result by adding all nodes to an array once you are done with the tree and just sorting this array using quicksort on frequency (which will be O(log n) on average). At least that's what I would do. Even tough it takes extra space it sounds more promising to me than re-arranging the tree in place.

Mecki 2009-08-13 16:45:02

But O(1) space is the *fun* way! :D

Agor 2009-08-13 16:49:20

Answer 6

+1 A:

I think you can create a new tree sorted by freq and push there all elements popping them from an old tree.

That could be O(1) though likely more like O(log N) which isn't big anyway.

Also, I don't know how you call it in C#, but in Python you can use list but sort it by two different keys in-place.

ilya n. 2009-08-14 01:10:15

ansaurus

tags:

views:

answers:

Sorting a binary search tree on different key value

related questions