views:

50

answers:

3

So, it seems to me like a query on a table with 10k records and a query on a table with 10mil records are almost equally fast if they are both fetching roughly the same number of records and making good use of simple indexes(auto increment, record id type indexed field).

My question is, will this extend to a table with close to 4 billion records if it is indexed properly and the database is set up in such a way that queries always use those indexes effectively?

Also, I know that inserting new records in to a very large indexed table can be very slow because all the indexes have to be recalculated, if I add new records only to the end of the table can I avoid that slow down, or will that not work because the index is a binary tree and a large chunk of the tree will still have to be recalculated?

Finally, I looked around a bit for a FAQs/caveats about working with very large tables, but couldn't really find one, so if anyone knows of something like that, that link would be appreciated.

A: 

Here is some good reading about large tables and the effects of indexing on them, including cost/benefit, as you requested:

http://www.dba-oracle.com/t_indexing_power.htm

Michael Goldshteyn
That referenced article does not give an in-depth investigation into indexing very large tables. It simply discusses the basics of indexes.
Mitch Wheat
A: 

Indexing very large tables (as with anything database related) depends on many factors, incuding your access patterns, ratio of Reads to Writes and size of available RAM.

If you can fit your 'hot' (i.e. frequently accessed index pages) into memory then accesses will generally be fast.

The strategy used to index very large tables, is using partitioned tables and partitioned indexes. BUT if your query does not join or filter on the partition key then there will no improvement in performance over an unpartitioned table i.e. no partition elimination.

SQL Server Database Partitioning Myths and Truths

Oracle Partitioned Tables and Indexes

It's very important to keep your indexes as narrow as possible.

Kimberly Tripp's The Clustered Index Debate Continues...(SQL Server)

Mitch Wheat
A: 

Accessing the data via a unique index lookup will slow down as the table gets very large, but not by much. The index is stored as a B-tree structure in Postgres (not binary tree which only has two children per node), so a 10k row table might have 2 levels whereas a 10B row table might have 4 levels (depending on the width of the rows). So as the table gets ridiculously large it might go to 5 levels or higher, but this only means one extra page read so is probably not noticeable.

When you insert new rows, you cant control where they are inserted in the physical layout of the table so I assume you mean "end of the table" in terms of using the maximum value being indexed. I know Oracle has some optimisations around leaf block splitting in this case, but I dont know about Postgres.

MrDBA