Tokyo Cabinet: lots of reads on target disks during batch insert

I have the following setup

Source: Raid0 (4 disks) with about a Tokyo Cabinet Hash-db with ~ 100.000 key-values where values are in compressed format, and each value consist of ~ 100 'sub-values' (values on avg are 1.5 MB)

Target Raid0 (4 disks) with empty Tokyo Cabinet db. with settings: #bnum=20000000#opts=l#xmsiz=268435456#thnum=10

Goal is to split all 100.000 values from source-db to 100.000* 100 -> 10mil subvalues and store them in target-db (the keys for the subvalues are store them in the target-db, all with their own keys so keys in the target db are guarenteed to be unique (application enforces it))

I use pytc (python binding) to tokyo cabinet which should be pretty efficient.

in the beginning all is ok (~50 values/sec read/split or 5000 'subvalues' p/sec insert) not really fast but it will do, but after a while (cache full?) when looking at iostat the TARGET raid system starts to produce a lot of reads and from then on performance is badddd. instead of 5000 inserts/sec the thoughput goes down to about 400-500 inserts/sec (ten-fold decrease).

I have read more comments about people reaching a performance-wall, but not nearly at the (relatively) small amount of items I'm looking at. Is is expected to have lots of reads from the target io-subsystem?

More in general, Anys clues, guesses where to look for the problem, of is this all just in normal ranges?

Thanks, Geert-Jan

ansaurus

tags:

views:

answers:

Tokyo Cabinet: lots of reads on target disks during batch insert

related questions