It seems that all the major investment banks use C++ in Unix (Linux, Solaris) for their low latency/high frequency server applications. Why is Windows generally not used as a platform for this? Are there technical reasons why Windows can't compete?
Technically, no. However, there is a very simple business reason: the rest of the financial world runs on Unix. Banks run on AIX, the stock market itself runs on Unix, and therefore, it is simply easier to find programmers in the financial world that are used to a Unix environment, rather than a Windows one.
Linux/UNIX are much more usable for concurrent remote users, making it easier to script around the systems, use standard tools like grep/sed/awk/perl/ruby/less on logs... ssh/scp... all that stuff's just there.
There are also technical issues, for example: to measure elapsed time on Windows you can choose between a set of functions based on the Windows clock tick, and the hardware-based QueryPerformanceCounter(). The former is increments each 10 to 16 milliseconds (note: some documentation implies more precision - e.g. the values from GetSystemTimeAsFileTime() measure to 100ns, but they report the same 100ns edge of the clock tick until it ticks again). The latter - QueryPerformanceCounter() - has show-stopping issues where different cores/cpus can report clocks-since-startup that differ by tens of milliseconds. MSDN documents this as a possible BIOS bug, but it's common. So, who wants to develop low-latency trading systems on a platform that can't be instrumented properly?
Many Linux/UNIX variants have lots of easily tweakable parameters to trade off latency for a single event against average latency under load, time slice sides, scheduling policies etc..
On the FUD/reputation side - somewhat intangible but an important part of the reasons for OS selection - I think most programmers in the industry would just trust Linux/UNIX more to provide reliable scheduling and behaviour. Further, Linux/UNIX has a reputation for crashing less, though Windows is pretty reliable these days, and Linux has a much more volatile code base than Solaris or FreeBSD.
What's the reported uptime reliability of the various OS's? Would you want your trading system to go down in the middle of a multimillion dollar arbitrage?
Reason is simple, 10-20 years ago when such systems emerged, "hardcore" multi-CPU servers were ONLY on some sort of UNIX. Windows NT was in kinder-garden these days. So the reason is "historical".
Modern systems might be developed on Windows, it's just a matter of taste these days.
PS: I am currencly working on one of such systems :-)
There are a variety of reasons, but the reason is not only historical. In fact, it seems as if more and more server-side financial applications run on *nix these days than ever before (including big names like the London Stock Exchange, who switched from a .NET platform). For client-side or desktop apps, it would be silly to target anything other than Windows, as that is the established platform. However, for server-side apps, most places that I have worked at deploy to *nix.
The performance requirements on the extremely low-latency systems used for algorithmic trading are extreme. In this environment, microseconds count.
I'm not sure about Solaris, but the case of Linux, these guys are writing and using low-latency patches and customisations for the whole kernel, from the network card drivers on up. It's not that there's a technical reason why that couldn't be done on Windows, but there is a practical/legal one - access to the source code, and the ability to recompile it with changes.
(I've worked in investment banking for 8 years) In fact, quite a lot of what banks call low latency is done in Java. And not even Real Time Java - just normal Java with the GC turned off. The main trick here is to make sure you've exercised all of your code enough for the jit to have run before you switch a particular VM into prod ( so you have some startup looping that runs for a couple of minutes - and hot failover).
The reasons for using Linux are:
Familiarity
Remote administration is still better, and also low impact - it will have a minimal effect on the other processes on the machine. Remember, these systems are often co-located at the exchange, so the links to the machines (from you/your support team) will probably be worse than those to your normal datacentres.
Tunability - the ability to set swappiness to 0, get the JVM to preallocate large pages, and other low level tricks is quite useful.
I'm sure you could get Windows to work acceptably, but there is no huge advantage to doing so - as others have said, any employees you poached would have to rediscover all their latency busting tricks rather than just run down a checklist.