views:

534

answers:

7

We have a low latency trading system (feed handlers, analytics, order entry) written in Java. It uses TCP and UDP extensively, it does not use Infiniband or other non-standard networking.

Can anyone comment on the tradeoffs of various OSes or OS configurations to deploy this system? While throughput is obviously important to keep up with modern price feeds, latency is our #1 priority.

Solaris seems like a natural candidate since they created Java; should I use Sparc or x64 processors?

I've heard good things about RHEL and SLERT, are those the right versions of Linux to use in our benchmarking.

Has anyone tested Windows against the above OSes? Or is it assumed to not keep up?

I'd like to leave the Java vs C++ debate for a different thread.

+1  A: 

I'd probably worry about garbage collection causing latency well before the operating system; have you looked into tuning that at all?

If I were willing to spend the time to trial different OSs, I'd try Solaris 10 and NetBSD, and probably a Linux variant for good measure.

I'd experiment with 32-vs-64 bit architectures; 64 bit will give you a larger heap address space... but will take longer to address each bit of memory.

I'm assuming you've profiled your application and know where the bottlenecks are; by the comment about GC, you've done that. In that case, your application shouldn't be CPU-bound, and chip architecture shouldn't be a primary concern.

Dean J
We have tuned the GC quite a bit already.
Ted Graham
Is Windows not cutting it? Or, what's the issue you're hitting?
Dean J
This is for a trading system, you ALWAYS want to be faster.
Ted Graham
+10  A: 

Vendors love this kind of benchmark. You have code, right?

IBM, Sun/Oracle, HP will all love to run your app on their gear to demonstrate their advantages.

Make them do this. If you have code, make the vendors run a demonstration on their gear to show which is best for your needs.

It's easy, painless, free, and factual. The final decision will be easy and obvious. And you will know how to install and tune to maximize performance.


What I hate doing is predicting this kind of thing before the code is written. Too many customers have asked for a H/W and OS recommendation before we've finished identifying all the use cases. Asking for that kind of precognition is simple craziness.

But you have code. You can produce test cases that exercise your code. That's perfect.

S.Lott
How much do you need to be spending on hardware to get this kind of demonstration? We have about 40 servers currently deployed and I guess I thought we were likely too small to do the vendor performance lab thing.
Ted Graham
@Ted Graham. Zero. If they want to sell you hardware, they must demonstrate that it works. They *love* doing this. Start asking your sales people for demos.
S.Lott
This answer is wonderful! The hardware companies would LOVE to try to show off their hardware at their best. Just do yourself one favor: specify exactly what you're measuring BEFORE you give them test code. In particular, you probably want sustained throughput, (number of transactions over a longer period, like over one hour) not the fastest single transaction.
Chip Uni
@Chip: actually, they don't want to focus on throughput: OP stated main concern is latency. So me, I'd focus on _longest_ transactions. But definitely not throughput.
CPerkins
I'm trying to get in touch with IBM and Sun to start such a comparison. IBM's realtime page list an email address for their RT team, but it bounces. Sun's Global Financial Services page has no way to get in touch with them and their chat rep has no clue. I've left a couple messages, we'll see if anyone knowledgable calls me back.
Ted Graham
@Ted Graham: You have 40 servers? And no sales representatives? How did you get those 40 servers?
S.Lott
@S.Lott: Dell's website ;)
Ted Graham
@Ted Graham: Which vendor did you settle for in the end?
Kynth
+1  A: 

I would strongly recommend that you look into an operating system you already have experience with. Solaris is a strange beast if you only know Linux, e.g.

Also I would strongly recommend to use a platform actually supported by Sun, as this will make it much easier to get professional assistance when you REALLY, REALLY need it.

http://java.sun.com/javase/6/webnotes/install/system-configurations.html

Thorbjørn Ravn Andersen
Most of our experience is on Windows; although we would be willing to hire a Solaris or Linux admin if needed
Ted Graham
A: 

I don't think managed code environments and real-time processing go together very well. If you really care about latency, remove the layer imposed by the managed code. This is not a Java vs C++ argument, but a Java/C#/... vs C/C++/FORTRAN/... argument, and I believe that is a valid design discussion to have.

And yes, I do mean FORTRAN, we run a number of near real-time systems with a FORTRAN foundation.

cdkMoose
Fortran smortran. You can compile Java to silicon http://cscott.net/Publications/design.ps and cut out the middle man.
Pete Kirkham
JVMs have some performance advantages (e.g., self-profiling at runtime) that would be lost by compiling Java to an exe or building it onto a chip.
Ted Graham
Compiling to silicon voids the OP's original premise of what OS to run on. If it is compiled, the OS question doesn't care about the original source language.
cdkMoose
Certainly if latency is your *overriding* concern, Java isn't the obvious choice - but that horse is out of the barn already, and there are still legitimate reasons to choose Java and push the latency down as much as possible rather than reject Java as an option altogether.
Yishai
I did not mean to rule out Java as an option, only to say that the Java/managed code choice is more important than the selection of an OS for the question of latency. The OP did stipulate that latency was his #1 priority
cdkMoose
-1 for failing to explain why you believe this or how you intend to replace the parts of the system you are removing, e.g. the GC.
Jon Harrop
Really, -1 because my answer isn't complete enough? For what it's worth, I believe that in an application that has significant latency requirements, I don't want a run-time with non-deterministic memory management. I'm not removing the GC from a managed code environment, I am suggesting that the OP evaluate using a managed code environment as compared to doing his own memory management in a compiled language. This should be an early design discussion.
cdkMoose
A: 

One way to manage latency is to have several JVM's dividing the work with smaller heaps so that a stop the world garbage collection isn't as time consuming when it happens and affects less processes.

Another approach is to load up a cluster of JVM's with enough memory and allocate the processes to ensure there won't be a stop the world garbage collection during the hours you care about latency (if this isn't a 24/7 app), and restart JVMs on off hours.

You should also look at other JVM implementations as a possibility (such as JRocket). Of course if any of them are appropriate depends entirely on your specific application.

If any of the above matters to your approach, it will affect the choice of OS. For example, if you go with another JVM implementation, that might limit OS choices, and if you go with clustering or otherwise running a several JVM's for the application, that might require some better underlying OS tools to manage effectively, further influencing the OS choice.

Yishai
+1  A: 

For a trading environment, in addition to low latency you are probably concerned about consistency as well as latency so focusing on reducing the impact of GC pauses as much as possible may well give you more benefit than differnt OS choices.

  • The G1 garbage collector in recent versions of Suns Hotspot VM improves stop the world pauses a lot, in a similar way to the JRockit VM
  • For real performance guarantees though, Azul Systems version of the Hotspot compiler on their Java Appliance delivers the lowest guaranteed pauses available - also it scales to a massive size - 100s of GB stack and 100s of cores.
  • I'd discount Java Realtime - although you'd get guarantees of response, you'd sacrifice throughput to get those guarantees

However, if your planning on using your trading system in an environment where every microsecond counts, you're really going to have to live with the lack of consistency you will get from the current generation of VM's - none of them (except realtime) guarantees low microsecond GC pauses. Of course, at this level your going to run into the same issues from OS activity (process pre-emption, interrupt handling, page faults, etc.). In this case one of the real time variants of Linux is going to help you.

Robert Christie
I'm less concerned with GC pauses than with what you classified as "OS Activity"I believe Azul's appliances do not provide sub-millisecond latency.
Ted Graham
OK. No VM's on the market provide sub ms latency - the G1 collector allows you to specify the target maximum pause in ms but the examples always go down the ms range.When you get to the point that os activity is causing you problems, you'll get a really big jump in performance from infiniband based RDMA. In the order of 10usec rather than around 70usec for 10GigE for a transfer.
Robert Christie
I agree no VM provides sub-millisecond latency guarantees for a GC, but I believe, that Azul doesn't provide sub-ms latency for processing events when GC is not involved. I got that from http://blogs.azulsystems.com/cliff/webtech/page/2/ (look at the Oct 28,2008 entry)
Ted Graham
@cb160: What do you know about doing infiniband RDMA from java?
Ted Graham
@Ted One of the features of JDK7 is Socket Direct Protocol which provides RDMA over infiniband. See http://java.sun.com/docs/books/tutorial/sdp/sockets/index.html
Robert Christie
+2  A: 

I wouldn't rule out Windows from this just because it's Windows. My expirience over the last few years has been that the Windows versions of the Sun JVM was usually the most mature performance wise in contrast to Linux or Soaris x86 on the same hardware. The JVM for Solaris SPARC may be good too, but I guess with Windows on x86 you'll get more power for less money.

x4u
Our benchmarking has shown Windows to be very competitive so far. We're about to start further testing, which is why I asked my question.
Ted Graham