views:

932

answers:

15

I thought the C/C++ vs C#/Java performance question was well trodden, meaning that I'd read enough evidence to suggest that the VM languages are not necessarily any slower than the "close-to-silicon" languages. Mostly because the JIT compiler can do optimizations that the statically compiled languages cannot.

However, I recently received a CV from a guy who claims that Java-based high frequency trading is always beaten by C++, and that he'd been in a situation where this was the case.

A quick browse on job sites indeed shows that HFT applicants need knowledge of C++, and a look at Wilmott forum shows all the practitioners talking about C++.

Is there any particular reason why this is the case? I would have thought that with modern financial business being somewhat complex, a VM language with type safety, managed memory, and a rich library would be preferred. Productivity is higher that way. Plus, JIT compilers are getting better and better. They can do optimizations as the program is running, so you'd think they's use that run-time info to beat the performance of the unmanaged program.

Perhaps these guys are writing the critical bits in C++ and and calling them from a managed environment (P/Invoke etc)? Is that possible?

Finally, does anyone have experience with the central question in this, which is why in this domain unmanaged code is without doubt preferred over managed?

As far as I can tell, the HFT guys need to react as fast as possible to incoming market data, but this is not necessarily a hard realtime requirement. You're worse off if you're slow, that's for sure, but you don't need to guarantee a certain speed on each response, you just need a fast average.

EDIT

Right, a couple of good answers thus far, but pretty general (well-trodden ground). Let me specify what kind of program HFT guys would be running.

The main criterion is responsiveness. When an order hits the market, you want to be the first to be able to react to it. If you're late, someone else might take it before you, but each firm has a slightly different strategy, so you might be OK if one iteration is a bit slow.

The program runs all day long, with almost no user intervention. Whatever function is handling each new piece of market data is run dozens (even hundreds) of times a second.

These firms generally have no limit as to how expensive the hardware is.

+2  A: 

There are reasons to use C++ other than performance. There is a HUGE existing library of C and C++ code. Rewriting all of that in alternate languages would not be practical. In order for things like P/Invoke to work correctly, the target code has to be designed to be called from elsewhere. If nothing else you'd have to write some sort of wrapper around things exposing a completely C API because you can't P/Invoke to C++ classes.

Finally, P/Invoke is a very expensive operation.

JIT compilers are getting better and better. They can do optimizations as the program is running

Yes, they can do this. But you forget that any C++ compiler is able to do the same optimizations. Sure, compile time will be worse, but the very fact that such optimizations have to be done at runtime is overhead. There are cases where managed languages can beat C++ at certain tasks, but this is usually because of their memory models and not the result of runtime optimizations. Strictly speaking, you could of course have such a memory model in C++, EDIT: such as C#'s handling of strings, /EDIT but few C++ programmers spend as much time optimizing their code as JIT guys do.

There are some performance issues that are an inherit downside to managed languages -- namely disk I/O. It's a one time cost, but depending on the application it can be significant. Even with the best optimizers, you still need to load 30MB+ of JIT compiler from disk when the program starts; whereas it's rare for a C++ binary to approach that size.

Billy ONeal
"But you forget that any C++ compiler is able to do the same optimizations". C++ compilers do not do things like on-line profile-guided optimizations.
Jon Harrop
@Jon: Neither do most JITs. And you can do profile-guided optimizations offline.
Billy ONeal
@Billy: HotSpot does.
Jon Harrop
+13  A: 

A JIT compiler could theoretically perform a lot of optimizations, yes, but how long are you willing to wait? A C++ app can take hours to compiler because it happens offline, and the user isn't sitting there tapping his fingers and waiting.

A JIT compiler has to finish within a couple of milliseconds. So which do you think can get away with the most complex optimizations?

The garbage collector is a factor too. Not because it is slower than manual memory management per se (I believe its amortized cost is pretty good, definitely comparable to manual memory handling), but it's less predictable. It can introduce a stall at pretty much any point, which might not be acceptable in systems that are required to be extremely responsive.

And of course, the languages lend themselves to different optimizations. C++ allows you to write very tight code, with virtually no memory overhead, and where a lot of high level operations are basically free (say, class construction).

In C# on the other hand, you waste a good chunk of memory. And simply instantiating a class carries a good chunk of overhead, as the base Object has to be initialized, even if your actual class is empty.

C++ allows the compiler to strip away unused code aggressively. In C#, most of it has to be there so it can be found with reflection.

On the other hand, C# doesn't have pointers, which are an optimizing compiler's nightmare. And memory allocations in a managed language are far cheaper than in C++.

There are advantages either way, so it is naive to expect that you can get a simple "one or the other" answer. Depending on the exact source code, the compiler, the OS, the hardware it's running on, one or the other may be faster. And depending on your needs, raw performance might not be the #1 goal. Perhaps you're more interested in responsiveness, in avoiding unpredictable stalls.

In general, your typical C++ code will perform similarly to equivalent C# code. Sometimes faster, sometimes slower, but probably not a dramatic difference either way.

But again, it depends on the exact circumstances. And it depends on how much time you're willing to spend on optimization. if you're willing to spend as much time as it takes, C++ code can usually achieve better performance than C#. It just takes a lot of work.

And the other reason, of course, is that most companies who use C++ already have a large C++ code base which they don't particularly want to ditch. They need that to keep working, even if they gradually migrate (some) new components to a managed language.

jalf
jaif - JIT compilers can cache their results (ie .Net) so that you only get a hit on the first execution. Also in the case of .Net they can optimize on a per machine basis from a single source code base - something that a static compiler can't do. I'd be surpised if Java didn't do similar things
Peter M
@Peter M: Sure a static compiler can do that. For example, if you want to increase speed on platforms with SSE, a compiler could include both code that uses SSE and code that does not.
Billy ONeal
@Peter: yes, but most users are not willing to add 2 hours to their startup time, is my point. Cached or not, the problem is that advanced optimizations take time, and no matter how aggressively you cache, they have to be performed at least once.
jalf
As for optimizing on a per machine basis, true, but how much does it actually happen? Taking Billy's example, I know that .NET doesn't actually generate SSE code at all, regardless of whether it is supported on the target machine.
jalf
@Billy - You're right .. I suppose it all comes down to where you want to perform the optimization. But it does mean your older static system can't take advantage of newer hardware without explicit recompilation by the developer. As opposed to the user just buying a new machine and having the code adapt to new hardware.
Peter M
@Jaif - start up times only come once per new program. For a system that runs 24/7/365 I don't see a single instance of 2 hours of compilation to be an issue. You might as well say that the statically compiled system was delivered 2 hours later because of the extra time to compile the program.
Peter M
@Jaif - the per machine compilation occurs through the framework installed on the target machine knowing what optimizations can be done. So the smarts that would have been in the static compiler are shifted to the on machine framework. As for .net and sse, how do I get my static compiler to use sse?
Peter M
@Peter: you might not see it as an issue, but the people who wrote the JIT compiler did. And so the JIT compiler **does not** spend 2 hours optimizing when you first run the program, which makes your point a bit academic. Yes, sure, in theory a JIT compiler could perform every optimization under the sun. But in practice they don't, because 99.999% of their customers do *not* want to wait 2 hours when they first start their program.
jalf
Moreover, the JIT compiler's strength is that it can optimize *on the fly*. It can't take advantage of this by precompiling everything when the program first runs, because all the runtime information is not available until the code runs (and preferably, has run a few hundred thousand iterations)
jalf
@Peter M - A C++ compiler can be set to optimize for a specific machine.
ChrisW
re. SSE: Some compilers can automatically vectorize (simple) loops, so it might happen for free there. Otherwise, you use compiler-specific intrinsics and language extensions. But whether it happens automatically or manually, it is *possible* to make use of SSE. Under .NET, it isn't.
jalf
@Jaif - if you install a new .net based system there is a background process that will do compilation at installation, not runtime. If you google for it you will find a lot of consternation and about what this mysterious process is.
Peter M
@ChrisW: Kind of. You can optimize for specific instruction sets (with/without SSE2, for example), and some compilers allow you to specify a specific CPU model to tune for. But that's still a lot more coarse grained than the info a JIT compiler has access to: it could in theory take amount of RAM vs RAM speed, CPU speed, OS version and a dozen other factors into account. It doesn't, in practice, but it *could*. Theoretically, a JIT compiler has a lot of advantages. My point was simply that it also has one major *disadvantage*: it doesn't have the same time budget as a static compiler.
jalf
@Jaif - back a few generations there was an optional process for Windows NT running on a Dec Alpha where the x86 code would be converted to native Alpha code on the fly in order to get a speed up. This only converted code paths that were actually executed. I tried it for fun once or twice, but the execution of x86 code by the Alpha was good enough for what I was doing at the time.
Peter M
@Peter: I don't see the relevance. How are your last two comments at all related to this discussion? Yes, a JIT compiler can be run at install time as well as at startup, or on the fly . It doesn't change the fact that people don't have the patience to wait 6 hours for it to finish. And even if *you* have that kind of patience, the JIT designers have **assumed** that people don't have this kind of patience, and so their compiler *does not* perform the same optimizations that a static compiler might do. There is no theoretical reason why it *couldn't* be done. But it isn't.
jalf
@Jif - I think that the main difference we have is that you seem to fixate on start up time equating to performance. If you have a system that starts up in 1 mS and makes you $1,000,000 per day, but I have a JIT one which takes 23 hours to start up (and has to be started every day) but makes $10,000,000 per day - then which is the "better" system.
Peter M
@Peter: No, the JIT version *will not* take 23 hours to start up, it will just skip on optimizations to get the application started faster. That's what JIT compilers do, they don't spend hours on startup, they start the program rather fast even though that means that they don't have time to do all the optimization possible.
sth
@Peter: no, the main difference is that I'm talking about the real world. In the real work, the JIT version doesn't spend more than a second or two on optimization *in total*, including startup, on-the-fly optimizations and everything else. A JIT compiler does not perform expensive optimizations. Yes, it **could**, but it doesn't. And therefore, it is not superior to a static compiler. It has its strengths, but also this one critical weakness. My answer was just trying to explain the advantages of either system *in the real world*.
jalf
You're right on a theoretical level though. In theory, a static compiler has no advantages whatsoever. But in practice, JIT and static compilers are designed with different goals in mind, which means a static compiler will do optimizations that a JIT wouldn't even consider. :)
jalf
"And memory allocations in a managed language are far cheaper than in C++". Have you actually tested that lately?
Jon Harrop
-1: For attributing C++'s slow compilation time to optimizations when it is actually because the language is crap. OCaml was regularly as fast as C++ whilst compiling orders of magnitude more quickly.
Jon Harrop
@Jon: It's both. Try measuring the difference in compile time if you toggle MSVC's link-time codegen on or off. Optimizations make a big difference on compile time, but yes, the language is *also* ridiculously hard to compile and even without optimizations it is a slow process. I never said otherwise (and you might find this old post of mine interesting: http://stackoverflow.com/questions/318398#318440 ). I never "attributed" C++'s compilation time to optimizations. I said that significant time is spent on optimizations, not that it is the *only*, or even the *biggest* factor.
jalf
As for memory allocations? Why yes, I have, actually. In .NET, an allocation is nothing more than incrementing a pointer. In native code, it has to traverse the heap finding an appropriate block of memory. That is quite a bit more expensive even in the best case. What was your point again?
jalf
@jalf: "In .NET, an allocation is nothing more than incrementing a pointer". That is not true. When the gen0 fills everything live in it gets copied out into freshly allocated blocks from gen1 (found by traversing the heap) and synchronizations are incurred. Those costs are amortized across all allocations to gen0 that remain live but they are *massive* costs.
Jon Harrop
@jalf: "It's both". You haven't separated concerns. Many of the optimizations C++ compilers do are made much easier and more efficient by changing the language. Look at aliasing, for example. The awful compile time of C++ compilers does not mean that C++-like performance requires awful compile times.
Jon Harrop
@Jon: I'm not sure what your point with optimizations is. What I'm saying is that a static compiler **can afford** to spend time on optimizations that would be too expensive to perform at JIT-time. That is true for *any* language, and is not specific to C++ at all. I get that you enjoy bashing C++ (and in many ways the language deserves it), but it has **nothing** to do with what I'm saying.As for allocation cost, true, in the worst case, it can be costly in .NET too -- by performing operations similar to what C++ has to do in **every** case. End result: managed allocations are faster.
jalf
@Jon: where did I say that "C++-like performance requires awful compile times"? If you're going to argue that I'm wrong, could you at least base it in what I actually **said**?
jalf
"by performing operations similar to what C++ has to do in every case". C++ does not perform a copying garbage collector in every case.
Jon Harrop
"Why yes, I have, actually". What C++ allocator did you use and what were you results?
Jon Harrop
"What I'm saying is that a static compiler can afford to spend time on optimizations that would be too expensive to perform at JIT-time". What are the names of these optimizations?
Jon Harrop
Here is an example of the limitations for the .NET JIT compiler when it comes to inlining: http://blogs.msdn.com/b/davidnotario/archive/2004/11/01/250398.aspx.Further, the JIT compiler operates per-method, and so performs zero intraprocedural analysis.
jalf
Also, many optimizations are applied iteratively in a static compiler, to take advantage of optimization opportunities resulting from earlier optimizations. A JIT typically cuts down on the number of iterations (perhaps only running a single iteration) A lot of optimizations (register allocation, for example) are NP-complete problems, and so they are generally implemented as heuristics giving a "good enough" approximation. And more time thrown at the problem enables a better approximation.
jalf
Now, I think I'm done wasting my time. You may want to read the *question* I answered, as well as my actual answer. Then sit down and ask yourself if you have any questions of **actual relevance** to those. I don't see the relevance of OCaml or C++'s terrible compile times, and I don't see why my answer is improved by providing you with a complete listing of every goddamn optimization performed by static and JIT compilers.
jalf
+21  A: 

A lot of it comes down to a simple difference between fact and theory. People have advanced theories to explain why Java should be (or at least might be) faster than C++. Most of the arguments have little to do with Java or C++ per se, but to dynamic versus static compilation, with Java and C++ really being little more than examples of the two (though, of course, it's possible to compile Java statically, or C++ dynamically). Most of these people have benchmarks to "prove" their claim. When those benchmarks are examined in any detail, it quickly becomes obvious that in quite a few cases, they took rather extreme measures to get the results they wanted (e.g., quite a number enable optimization when compiling the Java, but specifically disabled optimization when compiling the C++).

Compare this to the Computer Language Benchmarks Game, where pretty much anybody can submit an entry, so all of the code tends to be optimized to a reasonable degree (and, in a few cases, even an unreasonable degree). It seems pretty clear that a fair number of people treat this as essentially a competition, with advocates of each language doing their best to "prove" that their preferred language is best. Since anybody can submit an implementation of any problem, a particularly poor submission has little effect on overall results. In this situation, C and C++ emerge as clear leaders.

Worse, if anything these results probably show Java in better light than is entirely accurate. In particular, somebody who uses C or C++ and really cares about performance can (and often will) use Intel's compiler instead of g++. This will typically give at least a 20% improvement in speed compared to g++.

Edit (in response to a couple of points raised by jalf, but really too long to fit reasonably in comments):

  1. pointers being an optimizer writers nightmare. This is really overstating things (quite) a bit. Pointers lead to the possibility of aliasing, which prevents certain optimizations under certain circumstances. That said, inlining prevents the ill effects much of the time (i.e., the compiler can detect whether there's aliasing rather than always generating code under the assumption that there could be). Even when the code does have to assume aliasing, caching minimizes the performance hit from doing so (i.e., data in L1 cache is only minutely slower than data in a register). Preventing aliasing would help performance in C++, but not nearly as much as you might think.

  2. Allocation being a lot faster with a garbage collector. It's certainly true that the default allocator in many C++ implementations is slower than what most (current) garbage collected allocators provide. This is balanced (to at least a degree) by the fact that allocations in C++ tend to be on the stack, which is also fast, whereas in a GC language nearly all allocations are usually on the heap. Worse, in a managed language you usually allocate space for each object individually whereas in C++ you're normally allocating space for all the objects in a scope together.

It's also true that C++ directly supports replacing allocators both globally and on a class-by-class basis, so when/if allocation speed really is a problem it's usually fairly easy to fix.

Ultimately, jalf is right: both of these points undoubtedly do favor "managed" implementations. The degree of that improvement should be kept in perspective though: they're not enough to let dynamically compiled implementations run faster on much code -- not even benchmarks designed from the beginning to favor them as much as possible.

Edit2: I see Jon Harrop has attempted to insert his two (billionths of a) cent's worth. For those who don't know him, Jon's been a notorious troll and spammer for years, and seems to be looking for new ground into which to sow weeds. I'd try to reply to his comment in detail, but (as is typical for him) it consists solely of unqualified, unsupported generalizations containing so little actual content that a meaningful reply is impossible. About all that can be done is to give onlookers fair warning that he's become well known for being dishonest, self-serving, and best ignored.

Jerry Coffin
+1 for the *Language Shootout* reference, a fantastic resource I have pointed to in the past as well, and another +1 (if I could) for the 'theory versus fact' context -- so true! That said, it is not entirely appropriate here because the C# folks want to run on Windows-only for which we have no benchmark comparison.On a related note, I also heard that gcc/g++ 4.5 is closing in on icc so '20% improvement' may no longer hold. Would be worth another set of benchmarks.
Dirk Eddelbuettel
+1 for theory vs fact!
5ound
@Dirk: MSVC++? You can get Express for free. They also offer a lovely Profile Guided Optimization function in their Ultimate editions and a powerful profiler.
DeadMG
@DeadMG: Great, so please build the benchmark site. I am a happy Linux user, and thus served with the existing *Language Shootout*. Original poster is not.
Dirk Eddelbuettel
@Dirk Eddelbuettel: It's not my job to build a benchmark site to cover the failings of the original benchmark site.
DeadMG
@DeadMG: Do you understand that the *Shootout* site appears to be based on times from an Ubuntu Linux server? Hence the lack of usefulness of your MSVC++ recommendation.
Dirk Eddelbuettel
The performance impact of aliasing is potentially *huge*. You're right, in practice a C++ compiler spends a lot of effort overcoming it (both through inlining and static analysis in general) -- but that's my point. It's a compiler writer's nightmare because it takes a lot of effort to overcome and the issue is so ubiquitous in C/C++. It's pretty much *the* reason why Fortran is still faster than C++ at numerical computations. The compiler *does* try to deal with these aliasing issues, but it takes a lot of effort and it doens't always work.
jalf
As for memory allocations, you're right about stack usage in C++. My point was just that the primitive operation "allocate x bytes from the heap" is much more expensive in C++. Again, this can be compensated for by using memory pools and custom allocators, or allocating objects on the stack and so on. But it shows one advantage a managed language has. My point was just that there are advantages on either side, so it's not quite as simple as "one is faster than the other". It depends.
jalf
@jalf: Of course, because C++ offers you all the power in the world, you can overcome this and use a pool. Try to create an object in C# that doesn't inherit from System.Object. @Dirk: I really don't care. The point is that the site is useless for a Windows user, since it compares Linux performance. I'm not attacking it for not comparing Windows. I'm just saying that it's not relevant. And I really don't care why.
DeadMG
@Jerry Coffin >> the Great Computer Language Shootout << Please correct to the title shown on the website: The Computer Language Benchmarks Game.
igouy
@Jerry Coffin >> optimized to a reasonable degree (and, in a few cases, even an unreasonable degree) << Yeah, although I do try to push those into "interesting alternative" programs.
igouy
igouy
igouy
-1: Your statements about allocation performance are simply not true and you missed the point that they are incomparable anyway because GC's are thread safe.
Jon Harrop
+8  A: 

These firms generally have no limit as to how expensive the hardware is.

If they also don't care how expensive the sofware is, then I'd think that of course C++ can be faster: for example, the programmer might use custom-allocated or pre-allocated memory; and/or they can run code in the kernel (avoiding ring transitions), or on a real-time O/S, and/or have it closely-coupled to the network protocol stack.

ChrisW
Aha, these sound like some real advantages.
Carlos
actually I would say the trend to help with the kernel/user space transitions is to push more into user space rather than into the kernel.
pgast
+1  A: 

The simple fact is that C++ is designed for speed. C#/Java aren't.

Take the innumerable inheritance hierarchies endemic to those languages (such as IEnumerable), compared to the zero-overhead of std::sort or std::for_each being generic. C++'s raw execution speed isn't necessarily any faster, but the programmer can design fast or zero-overhead systems. Even things like buffer overruns- you can't turn their detection off. In C++, you have control. Fundamentally, C++ is a fast language- you don't pay for what you don't use. In contrast, in C#, if you use, say, stackalloc, you can't NOT do buffer overrun checking. You can't allocate classes on the stack, or contiguously.

There's also the whole compile-time thing, where C++ apps can take much longer, both to compile, and to develop.

DeadMG
+1 for being specific
Carlos
C# was designed to not be as slow as Java. The whole point of structs in C# is that you can allocate them on the stack or have contiguous arrays of them. You can also get pointers to objects and unsafely use them willy-nilly without bounds checking.
Gabe
@Gabe: Pointers don't have bounds checking. Arrays do. In addition, I sure hope that I didn't want a type that can both be contiguously arrayed AND referred to without interfering with the normal operations of my language. And, there's still the whole, ridiculous quantities of inheritance, thing. When you write C++, your class does exactly and only what you want it to, and interropping with other classes or the language libraries in a generic fashion can be done with NO runtime overhead. C# can't do either of those things. In addition, I can't make library classes into structs to exhibit that.
DeadMG
DeadMG: You appear to be trying very hard to disagree with me, and unless I'm not understanding you, without much success.
Gabe
C++ isn't designed for speed as such. A language designed for speed would do a lot more to eliminate aliasing. C++ is just designed for a lot of things that *also* happen to mean that it's fairly simple to translate C++ code into reasonably efficient machine code.
jalf
@Gabe: I am disagreeing with you because you are wrong. I brought up some limitations of C#, and you've come up with a vast, vast, vast minority of scenarios when they don't apply. For example, you CAN get a pointer to an object, but not beyond the current scope. You CAN allocate structs contiguously, but you can't stop them inheriting from System.Object, wasting cache space and enabling casts and functions you don't want. In C++, every type can be a reference type AND a value type AND be accessed through an unsafe pointer if you like in ALL scenarios.
DeadMG
DeadMG: Sorry, but you're still not disagreeing with me. You talked about how C++ is better than C# and Java. I talked about how C# is better than Java. Then you commented again about how C++ is better than C#, but that's not disagreeing with me because I didn't say anything about C++.
Gabe
jalf is right: C++ was designed for low-overhead, not for speed. Fortran was designed for speed, which is why it's hard to write numerical algorithms that are faster in C++ than in Fortran.
Gabe
@Gabe: The question specifically asks for C++ vs VM languages. If you're not commenting on C++ vs VM languages for performance, why are you here?
DeadMG
DeadMG: You said "C++ > Java, C#". I said "C# > Java". Then you said, "No, C++ > C#". All I meant was that it's really "C++ > C# > Java".
Gabe
@Gabe: Excuse me for assuming that you were answering the question.
DeadMG
+1  A: 

It's not only a matter of programming language, the hardware and operating system will be relevant to.
The best overall performance you will get with a realtime operating system, a realtime programming language and efficient (!) programming.

So you've quite a few possibilities in choosing an operating system, and a few in choosing the language. There's C, Realtime Java, Realtime Fortran and a few others.

Or maybe you will have the best results in programming an FPGA/Processor to eliminate the cost of an operating system.

The greatest choice you have to do, how many possible performance optimizations you will ignore in favor of choosing a language that eases development and will run more stable, because you can do less bugs, which will result in a higher availiability of the system. This shouldn't be overlooked. You have no win in developing an application which performs 5% faster than any other application which crashes every few point due to some minor hard to find bugs.

Tobias P.
+1 I hadn't heard of RT Java
Carlos
+1  A: 

In HFT, latency is a bigger issue that throughput. Given the inherent parallelism in the data source, you can always throw more cores at the problem, but you can't make up for response time with more hardware. Whether the language is compiled beforehand, or Just-In-Time, garbage collection can destroy your latency. There exist realtime JVMs with guaranteed garbage collection latency. It's a fairly new technology, a pain to tune, and ridiculously expensive, but if you have the resources, it can be done. It'll probably become much more mainstream in coming years, as the early adopters fund the R&D that's going on now.

Chris
"There's always the next release that will be really fast." Java people have said that for fifteen years ;-)
Dirk Eddelbuettel
AFAIK, real time GCs cost a *lot* in terms of throughput (like 50%).
Jon Harrop
A: 

A huge reason to prefer c++ (or lower level) in this case other than what has already been said, is that there are some adaptability benefits of being low level.

If hardware technology changes, you can always drop into an __asm { } block and actually use it before languages/compilers catch up

For example, there is still no support for SIMD in Java.

Inverse
"still no support for SIMD in Java" or .NET.
Jon Harrop
+2  A: 

This might be kinda off topic, but I watched a video a couple of weeks ago which might appear to be of interest to you : http://ocaml.janestreet.com/?q=node/61

It comes from a trading company which decided to use ocaml as its main language for trading, and I think their motivations should be enlightening to you (basically, they valued speed of course, but also strong typing and functional style for quicker increments as well as easier comprehension).

Shautieh
Indeed, F# (Microsoft's take on OCaml) is often used for this application due to its speed (better than OCaml: http://flyingfrogblog.blogspot.com/2009/07/ocaml-vs-f-qr-decomposition.html)
Gabe
I don't know much about F#, but if I am remembering the video I linked earlier well they choose ocaml over F#, and don't intend to switch in any foreseeable future.One reason being that F# runs on .net, which wasn't designed specifically for functional languages (and thus isn't always as optimised as it could be)...
Shautieh
I asked them about this when I was developing HLVM and they said that symbolic performance was equally important to them as numeric. F# generally has better numeric performance but its symbolic performance is much worse (often ~5× slower than OCaml) because .NET's GC is not optimized for this.
Jon Harrop
Thanks for the update, but how much is "5×" supposed to be ? ;)
Shautieh
+1  A: 

Virtual Execution Engines (JVM or CLR of .Net) do not permit structuring the work in time-efficient way, as process instances cannot run on as many threads as might be needed.

In contrast, plain C++ enables execution of parallel algorithms and construction of objects outside the time-critical execution paths. That’s pretty much everything – simple and elegant. Plus, with C++ you pay only for what you use.

Ivan Klianev
I've programmed threads with C++ and with .NET and I have no idea what you mean. Could you explain what you can do with C++ threads and not with e.g. .NET threads?
nikie
+17  A: 

Firstly, 1 ms is an eternity in HFT. If you think it is not then it would be good to do a bit more reading about the domain. (It is like being 100 miles away from the exchange.) Throughput and latency are deeply intertwined as the formulae in any elementary queuing theory textbook will tell you. The same formulae will show jitter values (frequently dominated by the standard deviation of CPU queue delay if the network fabric is right and you have not configured quite enough cores).

One of the problems with HFT arbitrage is that once you decide to capture a spread, there are two legs (or more) to realize the profit. If you fail to hit all legs you can be left with a position that you really don't want (and a subsequent loss) - after all you were arbitraging not investing.

You don't want positions unless your strategy is predicting the (VERY near term!!!) future (and this, believe it or not, is done VERY successfully). If you are 1 ms away from exchange then some significant fraction of your orders won't be executed and what you wanted will be picked off. Most likely the ones that have executed one leg will end up losers or at least not profitable.

Whatever your strategy is for argument's sake let us say it ends up a 55%/45% win/loss ratio. Even a small change in the win/loss ratio can have in big change in profitability.

re: "run dozens (even hundreds)" seems off by orders of magnitude Even looking at 20000 ticks a second seems low, though this might be the average for the entire day for the instrument set that he is looking at.

There is high variability in the rates seen in any given second. I will give an example. In some of my testing I look at 7 OTC stocks (CSCO,GOOG,MSFT,EBAY,AAPL,INTC,DELL) in the middle of the day the per second rates for this stream can range from 0 mps (very very rare) to almost almost 2000 quotes and trades per peak second. (see why I think the 20000 above is low.)

I build infrastructure and measurement software for this domain and the numbers we talk about are 100000's and millions per second. I have C++ producer/consumer infrastructure libraries that can push almost 5000000 (5 million) messages/second between producer and consumer, (32 bit,2.4 GHz cores). These are 64 byte messages with new, construct, enqueue, synchronize, on the producer side and synchronize,dequeue,touch every byte,run virtual destructor,free on the consumer side. Now admittedly that is a simple benchmark with no Socket IO (and socket IO can be ugly) as would be at the end points of the end point pipe stages. It is ALL custom synchronization classes that only synchronize when empty, custom allocators, custom lock free queues and lists, occasional STL(with custom allocators) but more often custom intrusive collections (of which I have a significant library). More than once I have given a vendor in this arena a quadruple (and more) in throughput without increased batching at the socket endpoints.

I have OrderBook and OrderBook::Universe classes that take less than 2us for new, insert, find, partial fill, find, second fill, erase, delete sequence when averaged over 22000 instruments. The benchmark iterates over all 22000 instruments serially between the insert first fill and last fill so there are no cheap caching tricks involved. Operations into the same book are separated by accesses of 22000 different books. These are very much NOT the caching characteristics of real data. Real data is much more localized in time and consecutive trades frequently hit the same book.

All of this work involves careful consideration of the constants and caching characteristics in any of the algorithmic costs of the collections used. (Sometimes it seems that the K's in K*O(n) K*O(n*log n) etc., etc., etc. are dismissed a bit too glibly)

I work on the Marketdata infrastructure side of things. It is inconceivable to even think of using java or a managed environment for this work. And when you can get this kind of performance with C++ and I think it is quite hard to get million+/mps performance with a managed environment) I can't imagine any of the significant investment banks or hedge funds (for whom a $250000 salary for a top notch C++ programmer is nothing) not going with C++.

Is anybody out there really getting 2000000+/mps performance out of a managed environment? I know a few people in this arena and no one ever bragged about it to me. And I think 2mm in a managed environment would have some bragging rights.

I know of one major player's FIX order decoder doing 12000000 field decodes/sec. (3Ghz CPU) It is C++ and the guy who wrote it almost challenged anybody to come up with something in a managed environment that is even half that speed.

Technologically it is an interesting area with lots of fun performance challenges. Consider the options market when the underlying security changes - there might be say 6 outstanding price points with 3 or 4 different expiration dates. Now for each trade there were probably 10-20 quotes. Those quotes can trigger price changes in the options. So for each trade there might be 100 or 200 changes in options quotes. It is just a ton of data - not a Large Hadron Collider collision-detector-like amount of data but still a bit of a challenge. It is a bit different than dealing with keystrokes.

Even the debate about FPGA's goes on. Many people take the position that a well coded parser running on 3GHZ commodity HW can beat a 500MHz FPGA. But even if a tiny bit slower (not saying they are) FPGA based systems can tend to have tighter delay distributions. (Read "tend" - this is not a blanket statement) Of course if you have a great C++ parser that you push through a Cfront and then push that through the FPGA image generator... But that another debate...

pgast
Wow, this is fascinating! Very detailed as well. A couple of questions: If you're responding in fractions of a millisecond, doesn't that severely limit the kind of strategy that you can implement? I can't imagine any even very simple strategy having enough time to calculate. Does this mean that all these HFT strategies are simply make/take arbitrages?
Carlos
No, it means that everything around the strategy is as good as it can be made. The strategy is the strategy.
pgast
The advantage of an FPGA is that you can have a *wide* pipeline(want a 2Kbit wide pipe? you got it!) and custom concurrent logic for super tight time constraints (of course, the max clock speed for the FPGA is slower than the max CPU speed). Looking at the latest Xilinx FPGA, they estimate up to terabit speed IO throughput for their machine.
Paul Nathan
@pgast: Do you find that managed languages offer insignificant benefits in terms of correctness in the context of this application?
Jon Harrop
I really can't answer that because I just don't use managed languages in this context. Period. As I said I am, on the marketdata side of things - upstream of the actual trading applications and strategy. When I said "The strategy is the strategy" - that code is NEVER seen except by a few (if it is a boutique firm it maybe just the founders)- it is protected like the Crown Jewels. This area has alot of VERY smart people in it.
pgast
The "run of the mill" guy with 750 on his math SATS and 800 on advanced math and one other science achievement test can end up exactly that at one of these firms - "run of the mill" and average. Correctness tends to be a little easier and less mysterious for this part of the bell curve.
pgast
Certainly they find it a bit easier than does a population who reads some buzz phrase on the internet or the popular programming book of the day "101 ways to better <insert your favorite language here> in 3 microseconds" and then pompously and condescendingly parrots the same crap back over and over again on some bulletin board or web site and thinks he is smart.
pgast
+1  A: 

Most of our code ends up having to be run on a Grid of 1000's of machines.

I think this environment changes the argument. If the difference between c++ and c# execution speed is 25% for example then other factors come into play. When this is run on a grid it may make no difference as to how it is coded as the whole process once spread across machines may not be an issue or solved by allocating or purchasing a few more machines. The most important issue and cost may become 'time to market' where c# may prove the winner and faster option.

Which is faster c++ or c#?

C# by six months......

Tony Lambert
You can't really say that C# is faster by a certain time period. Good developers in C++ can code just as fast as developers in C#, unless of course you hire crappy C++ developers and excellent C# developers.
Raindog
I think that was what they call a joke to illustrate a point.I've been coding C++ the best part of 20 years and C# for 5... There are certain features to C# that make it far easier and quicker to develop with. Compiled c# can be inspected from within the editor using reflection and so can help you more giving you edit time syntax checking and more extensive intelisense. The standard class libraries (.NET) are far more extensive and cohesive than C++'s STL. If you spend sometime developing with the latest Visual studio and Resharper you'd see what I mean.
Tony Lambert
Also I think with C# more developers will be classed as good because it is easier to get to grips with. I think it has always been hard to find excellent C++ developers because it is harder to master.
Tony Lambert
+1  A: 

One of the most interesting thing in C++ is that its performance numbers are not better, but more reliable.

It's not necessarily faster than Java/C#/..., but it is consistent accross runs.

Like in networking, sometimes the throughput isn't as important as a stable latency.

Steve Schnepp
+1  A: 

The elephant in the room here is the fact that C++ is faster than Java.

We all know it. But we also know that if we state it plainly, as I just did, that we can't pretend to engage in a meaningful debate about this undebatable topic. How much faster is C++ than Java for your application? That has the ring of a debatable topic, but, alas, it will always be hypothetical unless you implement your application in both languages, at which point it there will be no room for debate.

Let's go back to your first design meeting: The hard requirement for your project is high performance. Everyone in the room will think "C++" and a handful of other compiled languages. The guy in the room who suggests Java or C# will have to justify it with evidence (i.e., a prototype), not with hypotheticals, not with claims made by the vendors, not with statements on programmer gossip sites, and certainly not with "hello world" benchmarks.

As it stands now, you have to move forward with what you know, not with what is hypothetically possible.

John
A: 

Nikie wrote: “Could you explain what you can do with C++ threads and not with e.g. .NET threads?”

Threading with .Net could perform virtually everything C++ threading can, except:

  1. Efficient execution of COM-encapsulated binary code. For examples, sensitive algorithms that might have to be kept secret from application developers. (Might be relevant in HFT)
  2. Creation of lean threads that do not exhaust system resources with chunky building blocks – wrapped OS APIs and synchronization & signaling OS primitives. (Extremely relevant with parallel algorithms for time-optimization of performance in HFT)
  3. Scaling up the throughput of a business process application 10 or more times on the same hardware and with the same latency. (Not relevant in HFT)
  4. Scaling up 100 and more times the number of concurrently handled user interactions per unit of hardware. (Not relevant in HFT)

Using more CPU cores cannot fully compensate exhausting of system resources by the building blocks of .Net since more CPU cores are a guarantee for appearance of memory contention.

Ivan Klianev