views:

659

answers:

8

Hi,

We all know that a hash table has O(1) time for both inserts and look-ups if the hash function was well chosen. So, what are the reason we want to use Binary Search Tree? Just because a perfect hash function was hard to design?

Here how I come up with this question? I notice that Standard C++ STL has set and map which are implemented with Binary search tree ,but has no hash (not talking about non-stardard hash_set , hash_map). While, Ruby has only Hash. I want to understand the rational behind this difference.

+1  A: 

Well search trees are ordered, hashes are not.

Nifle
This seems only matters when traversalling it.
pierr
+1  A: 

You can access the data in a binary search tree in order.

Stephen Denne
+5  A: 

Trees allow in-order traversion.

The worst case performance for a hash table is O(N) (linear search through one bucket), a binary search is bound by O(log N).

While such a degradation is unlikely, it is not impossible and depends strongly on the ability to chose an appropriate hash function and the distribution of the actual data.

A tree implementation also grows trivially to the required size, whereas a hashmap starts to degrade when it gets full (for most implementations, it's said around 70% of the buckets filled). You either need to rehash the entire table (again, bad fo real time apps), or incrementally move to a new table, which is not a simple implementation.

In the end, STL probably just went with one "base" container template, the tree, to avoid the additional implementation complexity.

peterchen
+2  A: 

To add on peterchen answer, hash structures although theoretically faster at insertion and removal depend vastly on the actual data, the chosen hash function and the amount of data.

  • A perfect hash function depends on the amount and distribution of the data.

Having large performance variations between best and worst cases makes them unfit for general purpose structures. Binary trees on the other hand are more predictable independently of the amount/type of data used, even though less efficient on best case scenario.

Vasco Fernandes
+1  A: 

The STL didn't initially include a hash table among the containers as hash tables are more complex - you have to choose between open and closed addressing, not to mention the hash function, etc. At the time, Stepanov and Stroustrup were trying to speed up progress on it so that it was quickly accepted into the standard.

Trees on the other hand, are relatively simpler. It was already known that since these are in-memory data structures, we can just use a binary tree instead of a B-tree. Then it was a choice between AVL and RB trees. RB trees tend to be chosen due to better performance characteristics which I am not in a position to comment on, but the Wikipedia articles on both structures (AVL and RB) will tell you more in relatively good detail.

Otherwise, trees and hash tables are good for different things. If you need fast inserts or retrievals, and couldn't care about the order they are stored in, hash tables are good. If you need ordering characteristics and strong guarantees on inserts and retrievals, then binary trees are good. Another good rule of thumb is to profile. Since most uses of either would be interface compatible, profiling to see which gives you better performance helps too.

blwy10
+1  A: 

To use a tree you need a way to order items in the tree. To use a hash table you need a function to compute the hash value of an item in the hash table.

Interestingly the .NET framework requires every class to implement (or inherit) the GetHashCode function enabling every object to be stored in a hash table. However, this also adds an additional burden on developers that are required to implement semantically correct hash functions even if they don't intend the class to be hashed. One solution is to return a constant value from GetHashCode which is semantically correct, but not very efficient should the function ever be used for hashing.

Martin Liversage
+1  A: 

At the time of C++ people were still fans of hard-core academic approach to data structures and algorithms, so they preferred structures with smaller memory footprint and well-understood best- and worst- case behavior.

By the time Ruby appeared, and for the purposes of scripting, people realized they favor simplicity over raw performance, and since hashtables allow semantics of both arrays (if you use sequential index as the key) AND of dictionaries (if you use natural key), they were deemed as more universal data structure.

zvolkov
A: 

what is the best searching method in binary search and hash searching?

siva