views:

29

answers:

1

I have created an archive table which will store data for selecting only.

Daily there will be a program to transfer a batch of records into the archive table. There are several columns which are indexed; while others are not.

I am concerned with time cost per batch insertion:
- 1st batch insertion: N1
- 2nd batch insertion: N2
- 3rd batch insertion: N3

The question is: will N1, N2, and N3 roughly be the same, or N3 > N2 > N1?

That is, will the time cost be a constant or incremental, with existence of several indexes?

All indexes are non-clustered.

The archive table structure is this:

create table document (
   doc_id   int unsigned primary key,
   owner_id int,  -- indexed
   title    smalltext,
   country  char(2),
   year     year(4),
   time     datetime,

   key ix_owner(owner_id)
}
+2  A: 

Cost will be incremental, O(log(n)).

In practice, you will have a step on graph when the index size reaches a certain threshold and the index pages will stop fitting into the cache

The size of the cache is defined by key_buffer_size in MyISAM and innodb_buffer_pool_size in InnoDB.

Before this threshold, the cost will be proportional to the memory seek time, after the threshold, the cost will be proportional to the disk seek time (all multiplied by the log(n) of course)

Quassnoi
Can you provide some explanation?
SiLent SoNG
@Silent Song: the cost of inserting a record into a `B-Tree` index varies according to the number of page traversals that are needed to locate the record, plus the number of the page splits. Page split cost is constant, the number of traversals increases at `O(log(n))`. As your index grows larger, you may also observe a step on the graph when the index pages stop fitting into the cache and the page splits and page traversals will require disk reads.
Quassnoi
Thanks for your explanation!
SiLent SoNG