views:

79

answers:

5

I am testing the performance of a data streaming system that supports continuous queries.

This is how it works: - There is a polling service which sends data to my system. - As data passes into the system, each query evaluates based on a window of the stream at the current time. - The window slides as data passes in.

My problem is this, when I add more queries to the system, I should expect the throughput to decrease because it can't cope the data rate.

However, I actually observe an increase in throughput.

I can't understand why this is the case and I am guessing that it's something to do with the way the JVM allocates CPU, memory etc.

Can anyone shed any light to my problem?

+2  A: 

As always, the answer is to profile. Just to offer a guess, though: the Hotspot VM needs quite a few passes before it starts doing it's JIT magic.

Hank Gay
+1 - That would be my guess too.
Stephen C
+2  A: 

Your question is very light on technical details but here's a guess.

If the IO streaming subsystem is reasonably efficient (e.g. select based) and an individual client doesn't saturate the network interface then the existence of many clients could increase the total throughput simply because the server process can handle more data.

maerics
+2  A: 

Most Java Virtual Machines initially interpert the JVM bytecode, which is slightly slower than native machine code execution. As the JVM discovers that you are using a particular section of the code repeatedly, it compiles that section of code into native machine code (increasing it's processing speed). As a result, sometimes stress testing code, or even leaving the code running for longer, tends to speed up execution instead of slowing it down. The HotSpot JVM (the default one from SUN) is the most known JVM which performs native compilation to speed up code execution.

Also, many Java libraries are very mature compared to some libraries you may have encountered in the past. That means that instead of allocating a thread to process a request, they might be using non-blocking listeners on sockets, thread pools of re-assignable worker threads, or any number of techniques suitable for high throughput processing. This coupled with the self tuning of a JIT (HotSpot-like) JVM makes benchmarking Java quite a challenge. Generally speaking, things tend to get faster the longer they run, up to a point.

Edwin Buck
A: 

Performance of a Java application, and especially microbenchmarking (benchmarking a very small piece of code) can be very hard in Java because the JVM, the JIT compiler and the garbage collector can have a large and hard to predict influence on the performance of the program.

There's an excellent series of articles "Java theory and practice" by Java concurrency and performance guru Brian Goetz:

Jesper
A: 

Another dumber theory: well, of course throughput increases as load increases, at least until you hit capacity. If you can handle 100 queries per second on average, and you send 10 per second on average, throughput 10 queries per second. If you increase load to 100qps on average, throughput is (nearly) 100qps. It gets worse after that of course. Are you not near capacity? Sorry if this is something you've surely ruled out.

Sean Owen