views:

88

answers:

1

I'm evaluating the performance of an experimental system setup on an 8-core machine with 16GB RAM. I have two main-memory Java RDBMSs (hsqldb) running, and against each of these I run a TPCC client (derived from jTPCC/BenchmarkSQL).

I have scripts to launch things, so e.g. the hsqldb instances are started with:

./hsqld.bash 0 &
./hsqld.bash 1 &

If I start the clients at nearly the same time:

./hsql-tpcc.bash 0 &
./hsql-tpcc.bash 1 &

then each of those clients has a spiked initial rate at around 500-1000 tpmC (this is basically transactions per minute), then quickly (in less than a second) settles to a rate of around 200-250 tpmC. OTOH, if I wait for a second or two before starting the second client:

./hsql-tpcc.bash 0 &
sleep 1
./hsql-tpcc.bash 1 &

then each of the clients runs at 2500+ tpmC. Waiting for more than a second doesn't make any more difference.

This is strange because client 0 just talks to server 0 and client 1 just talks to server 1. It's unclear why there's such a dramatic performance interference.

I thought this may be due to CPU scheduler affinity of the clients, but they take only about 1-3% of a single core when running slowly (20-25% when running quickly). Another suspicion was in the clients' NUMA bindings (memory contention on same memory node), but the machine has apparently just 1 memory node (there's only /sys/devices/system/node/node0), and furthermore each client takes just 0.8% of memory.

It also doesn't seem due to CPU bindings for the hsqldb instances, since both fast and slow behaviors can be seen just by restarting the clients (and waiting/not waiting for a second), leaving the same hsqldb instances running across both (i.e. hsqldb doesn't have to be restarted). hsqldb takes 4-8% CPU when slow, 80% CPU when fast, and 4.3% mem.

Any other ideas why this could be happening? There's no disk IO involved, and I'm not close to exhausting the system's memory. Thanks in advance. Other relevant info follows:

$ uname -a
Linux hammer.csail.mit.edu 2.6.27.35-170.2.94.fc10.x86_64 #1 SMP Thu Oct 1 14:41:38 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
+1  A: 

How long have your "two main-memory Java RDBMSs (hsqldb)"'s been running before the test? If you start them right before the test, try warming them up a bit first. Let hotspot do it's thing, and get through all of the if (first_time) { do_initialization(); } code in the db's so the garbage collector can settle down.

Also, starting two things (no matter what they are) at the same time means that minimally, both are trying to do all of their init work at the same time (allocate memory, allocate pages in swap, find and load libraries, etc.). So both programs spend the first milliseconds of their lives in I/O contention.

Seth
I've tried restarting the hsqldb servers right before starting the clients (i.e. they've been running for near-0 time before the test), and also not restarting them in between tests (i.e. they've been running for many minutes before the test, and gone through several tests). These tests run for at least a minute, so the anomaly is not for lack of warm-up time.
Yang