views:

76

answers:

3

If I have a table column with data and create an index on this column, will the index take same amount of disc space as the column itself?

I'm interested because I'm trying to understand if b-trees actually keep copies of column data in leaf nodes or they somehow point to it?

Sorry if this a "Will Java replace XML?" kind question.

UPDATE:

created a table without index with a single GUID column, added 1M rows - 26MB

same table with a primary key (clustered index) - 25MB (even less!), index size - 176KB

same table with a unique key (nonclustered index) - 26MB, index size - 27MB

So only nonclustered indexes take as much space as the data itself.

All measurements were done in SQL Server 2005

+1  A: 

The B-Tree points to the row in the table, but the B-Tree itself still takes some space on disk.

Some database, have special table which embed the main index and the data. In Oracle, it's called IOT -- index-organized table.

Each row in a regular table can be identified by an internal ID (but it's database specific) which is used by the B-Tree to identify the row. In Oracle, it's called rowid and looks like AAAAECAABAAAAgiAAA :)

If I have a table column with data and create an index on this column, will the index take same amount of disc space as the column itself?

In a basic B-Tree, you have the same number of node as the number of item in the column.

Consider 1,2,3,4:

    1 
  / 
2
   \ 3 
      \ 4

The exact space can still be a bit different (the index is probably a bit bigger as it need to store links between nodes, it may not be balanced perfectly, etc.), and I guess database can use optimization to compress part of the index. But the order of magnitude between the index and the column data should be the same.

ewernli
Is it kind of clustered index?
Anton
I think clustered indexes store the actual data, because clustered indexes are sorted in same way as data is.
Valentin Vasiliev
Yes, IOT is similar to a clustered index. The row in the table are physically re-ordered. Excellent performance to query data, but slower to insert.
ewernli
+3  A: 

I'm almost sure it's quite a DB dependent, but generally – yeah, they take additional space. This happens because of two reasons:

  1. This way you can utilize the fact the data in BTREE leafs is sorted;

  2. You gain lookup speed advantage as you don't have to seek back and forth to fetch neccessary stuff.

PS just checked our mysql server: for a 20GB table indexes take 10GB of space :)

Anton
A: 

Judging by this article, it will, in fact, take at least the same amount of space as the data in the column (in PostgreSQL, anyway). The article also goes to suggest a strategy to reduce disk and memory usage.

A way to check for yourself would be to use e.g. the derby DB, create a table with a million rows and a single column, check it's size, create an index on the column and check it's size again. If you take the 10-15 minutes to do so, let us know the results. :)

Tomislav Nakic-Alfirevic