ansaurus

Question

Why does searching an index have logarithmic complexity?

Answer 1

+3 A:

Database Indices are usually (almost always) stored as B-Trees. And all balanced tree structures have O(log n) complexity for searching.

'Dictionary' is an 'Abstract Data Type' (ADT), ie it is a functional description that does not designate an implementation. Some dictionaries could use a Hashtable for O(1) lookup, others could use a tree and achieve O(log n).

The main reason a DB uses B-trees (over any other kind of tree) is that B-trees are self-balancing and are very 'shallow' (requiring little disk I/O)

Henk Holterman 2010-03-15 11:57:43

All balanced trees do. A sufficiently degenerate tree is a linked list.

Vatine 2010-03-15 12:09:15

@Vatine: You're right, I'll edit that in.

Henk Holterman 2010-03-15 12:16:46

Answer 2

+3 A:

One of the only data structures you can access immediately with a key is a vector, which for a massive amount of data, becomes inefficient when you start inserting and removing elements. It also needs contiguous memory allocation.

A hash can be efficient but needs more space and will end up with collisions.

A B tree has a good balance between search performance and space.

Andres 2010-03-15 12:00:36

Answer 3

+5 A:

Dictionaries are not implicitly sorted, B-Trees are.

A B-Tree index can be used for ranged access, like this:

WHERE col1 BETWEEN value1 AND value2

or ordering, like this:

ORDER BY col1

You cannot immediately access a page in a B-Tree index: you need to traverse the child pages whose number increases logarithmically.

Some databases support dictionary-type indexes as well, namely, HASH indexes, in which case the search time is constant. But such indexes cannot be used for ranged access or ordering.

Quassnoi 2010-03-15 12:01:35

A dictionary could be sorted. It just doesn't have to be.

Henk Holterman 2010-03-15 12:04:01

@Henk: corrected to clarify this.

Quassnoi 2010-03-15 12:05:05

@Henk: I think dictionaries are referring to hashtables with O(1) access. A dictionary could be sorted, but to do that sorting you will either have some linear structure (i.e. O(N) queries) or a tree structure (O(logN)) underneath.

Il-Bhima 2010-03-15 12:07:03

@Il-Bhima: I know of Dictionary classes that are sorted.

Henk Holterman 2010-03-15 12:08:37

@Henk: Ok, but how are they implemented? I would be very surprised (please tell me if this is so) to find a hash-table which can sort data without having some underlying ordered data-structure. You can define a Dictionary class interface on a tree, but that doesn't make it a dictionary in the usual sense.

Il-Bhima 2010-03-15 12:12:47

@Il-Bhima: Sorted dictionaries will usually use a Tree.

Henk Holterman 2010-03-15 12:18:59

Quassnoi: Sorry, I misread the child-pages part.

Henk Holterman 2010-03-15 12:19:45

Answer 4

+1 A:

hashindex (eg. in mysql and postgres) has constant complexity (O(1)) for search.

 CREATE INDEX ... USING HASH

osgx 2010-03-15 12:02:15

wouldn't it have constant time complexity? Linear is the worst possible, i.e. searching without an index.

Il-Bhima 2010-03-15 12:16:08

@Il-Bhima, oh.. yes :) this was a some type of mind typo.

osgx 2010-03-15 19:53:27

Answer 5

+2 A:

If your only queries are equality tests then, its true, dictionaries are a good choice since they will do lookups in amortized O(1) time. However, if you want to extend queries to involve range checks, eg (select * from students where age > 10) then suddenly your dictionaries lose their advantage completely.. This is where tree-based indexes come in. With a tree-based index you can perform equalities and range checks in logarithmic time.

There is one problem with naive tree structures. They get unbalanced, this means that after adding certain values to the index, the tree structure become lopsided (ex like a long line) and lookups start to take O(N) time again. This can be resolved by balancing your tree. The B-Tree is one such approach which also takes advantage of systems capable of doing large blocks of I/O, and so is most appropriate for databases.

Il-Bhima 2010-03-15 12:03:41

Answer 6

+1 A:

You can achieve O(1) if you pre-allocate N entries an array and hash the key to this N values.

But then if you have more than N entries stored there are collision. So for each key in the array you have a list of value. So it's not exactly O(1) anymore. Scanning the list itself will be O(m) where m is the average number of collision.

Example with hash = n mod 3
0  -->  [0,a] [3,b] ...
1  -->  [1,a] [4,b] [7,b] ...
2  -->  [2,a] [5,b] ...

At a point in time, it becomes some bad that you spend more time traversing the list of value for a potential key than using another structure with O(log n) lookup time, where n is the total number of entries.

You could of course pick N so big that the array/hash would be faster than the B-Tree. But the array has a fixed pre-allocated size. So if N=1000 and you store 3 values, you've wasted 997 slots in the array.

So it's essentially a performance-space trade-off. For small set of value, array and hashing is excellent. For large set of value, B-Tree are most efficient.

ewernli 2010-03-15 12:06:17

Answer 7

+1 A:

Hashes are the fastest look up data structures, but have some problems:

a) are not sorted b) no matter how good the hash is, will have collisions, that becomes problematic when lots of data c) to find a hash value in a hash indexed file takes a long time, so most of the time hashes make sense only for in memory (RAM) data - which makes them not suitable for databases- that most of the time cannot fit all data in RAM

Sorted trees address these issues, and b-trees operations in particular can be implemented efficiently using files. The only drawback is they have slower lookup times as a hash structure.

No data structure is perfect in all cases, depending on estimated size of data and how you use it, one is better.

pt 2010-03-15 12:19:56

ansaurus

tags:

views:

answers:

Why does searching an index have logarithmic complexity?

related questions