views:

1859

answers:

6

Hey guys,

I'm trying to find the definition of a binary search tree and I keep finding different definitions everywhere.

Some say that for any given subtree the left child key is less than or equal to the root.

Some say that for any given subtree the right child key is greater than or equal to the root.

And my old college data structures book says "every element has a key and no two elements have the same key."

Is there a universal definition of a bst? Particularly in regards to what to do with trees with multiple instances of the same key.

Thanks!

EDIT: Maybe I was unclear, the definitions I'm seeing are

1.) left <= root < right

2.) left < root <= right

3.) left < root < right, such that no duplicate keys exist.

A: 

Any definition is valid. As long as you are consistent in your implementation (always put equal nodes to the right, always put them to the left, or never allow them) then you're fine. I think it is most common to not allow them, but it is still a BST if they are allowed and place either left or right.

SoapBox
if you have a set of data which contains duplicate keys, then those items should all be stored within the 1 node on the tree via a different method (linked list, etc). the tree should only contain unique keys.
nickf
Should perhaps, but it is not required for the properties of a search tree.
SoapBox
#1 condition of a BST: "each node (item in the tree) has a distinct value;" http://en.wikipedia.org/wiki/Binary_search_tree
nickf
Wikipedia isn't the end all source of knowledge. Explain to me what is invalid about having duplicate keys in a BST other than wasting space... it maintains all important properties.
SoapBox
Also note from the wiki that the right subtree contains values "greater than or equal to" the root. Hence the wiki definition is self-contradictory.
SoapBox
+1: Different people use different definitions. If you implement a new BST, you need to make sure you're explicit about which assumptions you're making about duplicate entries.
Mr Fooz
The usual convention is less than => left, greater than => right
Mitch Wheat
Seems like the consensus is (left <= root <= right) when allowing duplicates. But that some folks definition of a BST does not allow dups. Or maybe some people teach it that way to avoid the additional complexity.
Tim Merrifield
incorrect! it's EITHER left <= root < right OR left < root <= right, OR left > root >= right OR left >= root > right
Mitch Wheat
What?! Where are you getting this definition from?
Tim Merrifield
The less than or greater than relationship either points left OR right. Where you implement the equals is implementation dependent but it must only be on one side.
Mitch Wheat
Again, provide me a link to a definition of a BST that says the relationship can go either way.
Tim Merrifield
@Mitch, it doesn't HAVE to be so, it just makes the processing of the tree a lot easier. Having said that, I'd question the sanity of anyone who implemented it otherwise (arbitrarily left or right rather than same way always).
paxdiablo
A: 

Those three things you said are all true.

  • Keys are unique
  • To the left are keys less than this one
  • To the right are keys greater than this one

I suppose you could reverse your tree and put the smaller keys on the right, but really the "left" and "right" concept is just that: a visual concept to help us think about a data structure which doesn't really have a left or right, so it doesn't really matter.

nickf
+3  A: 

In a BST, all values descending on the left side of a node are less than (or equal to, see later) the node itself. Similarly, all values descending on the right side of a node are greater than (or equal to) the nodes value.

Some BSTs may choose to allow duplicate values, hence the "or equal to" qualifiers above.

The following example may clarify:

            |
      +--- 14 ---+
      |          |
+--- 13    +--- 22 ---+
|          |          |
1         16    +--- 29 ---+
                |          |
               28         29

This shows a BST that allows duplicates - you can see that to find a value, you start at the root node and go down the left or right subtree depending on whether your search value is less than or greater than the node value.

Duplicates add a little complexity since you may need to keep searching once you've found your value for other nodes of the same value.

Thanks to Mitch Wheat for pointing out that duplicates make more sense to go either right or left but not both ways - this makes insertions and searches easier.

paxdiablo
+3  A: 

Many algorithms will specify that duplicates are excluded. For example, the example algorithms in the MIT Algorithms book usually present examples without duplicates. It is fairly trivial to implement duplicates (either as a list at the node, or in one particular direction.)

Most (that I've seen) specify left children as <= and right children as >. Practically speaking, a BST which allows either of the right or left children to be equal to the root node, will require extra computational steps to finish a search where duplicate nodes are allowed.

It is best to utilize a list at the node to store duplicates, as inserting an '=' value to one side of a node requires rewriting the tree on that side to place the node as the child, or the node is placed as a grand-child, at some point below, which eliminates some of the search efficiency.

You have to remember, most of the classroom examples are simplified to portray and deliver the concept. They aren't worth squat in many real-world situations. But the statement, "every element has a key and no two elements have the same key", is not violated by the use of a list at the element node.

So go with what your data structures book said!

Edit:

Universal Definition of a Binary Search Tree involves storing and search for a key based on traversing a data structure in one of two directions. In the pragmatic sense, that means if the value is <>, you traverse the data structure in one of two 'directions'. So, in that sense, duplicate values don't make any sense at all.

This is different from BSP, or binary search partition, but not all that different. The algorithm to search has one of two directions for 'travel', or it is done (successfully or not.) So I apologize that my original answer didn't address the concept of a 'universal definition', as duplicates are really a distinct topic (something you deal with after a successful search, not as part of the binary search.)

Chris
A: 

1.) left <= root < right

2.) left < root <= right

3.) left < root < right, such that no duplicate keys exist.

I might have to go and dig out my algorithm books, but off the top of my head (3) is the canonical form.

(1) or (2) only come about when you start to allow duplicates nodes and you put duplicate nodes in the tree itself (rather than the node containing a list).

Robert Paulson
+1  A: 

If your binary search tree is a red black tree, or you intend to any kind of "tree rotation" operations, duplicate nodes will cause problems. Imagine your tree rule is this:

left < root <= right

Now imagine a simple tree whose root is 5, left child is nil, and right child is 5. If you do a left rotation on the root you end up with a 5 in the left child and a 5 in the root with the right child being nil. Now something in the left tree is equal to the root, but your rule above assumed left < root.

I spent hours trying to figure out why my red/black trees would occasionally traverse out of order, the problem was what I described above. Hopefully somebody reads this and saves themselves hours of debugging in the future!

Rich