views:

280

answers:

7

I am building a trading portfolio management system that is responsible for production, optimization, and simulation of non-high frequency trading portfolios (dealing with 1min or 3min bars of data, not tick data).

I plan on employing Amazon web services to take on the entire load of the application.

I have four choices that I am considering as language.

  1. Java
  2. C++
  3. C#
  4. Python

Here is the scope of the extremes of the project scope. This isn't how it will be, maybe ever, but it's within the scope of the requirements:

  • Weekly simulation of 10,000,000 trading systems.
  • (Each trading system is expected to have its own data mining methods, including feature selection algorithms which are extremely computationally-expensive. Imagine 500-5000 features using wrappers. These are not run often by any means, but it's still a consideration)
  • Real-time production of portfolio w/ 100,000 trading strategies
  • Taking in 1 min or 3 min data from every stock/futures market around the globe (approx 100,000)
  • Portfolio optimization of portfolios with up to 100,000 strategies. (rather intensive algorithm)

Speed is a concern, but I believe that Java can handle the load.

I just want to make sure that Java CAN handle the above requirements comfortably. I don't want to do the project in C++, but I will if it's required.

The reason C# is on there is because I thought it was a good alternative to Java, even though I don't like Windows at all and would prefer Java if all things are the same.

Python - I've read somethings on PyPy and pyscho that claim python can be optimized with JIT compiling to run at near C-like speeds... That's pretty much the only reason it is on this list, besides that fact that Python is a great language and would probably be the most enjoyable language to code in, which is not a factor at all for this project, but a perk.

To sum up:

  • real time production
  • weekly simulations of a large number of systems
  • weekly/monthly optimizations of portfolios
  • large numbers of connections to collect data from

There is no dealing with millisecond or even second based trades. The only consideration is if Java can possibly deal with this kind of load when spread out of a necessary amount of EC2 servers.

Thank you guys so much for your wisdom.

+5  A: 

Pick the language you are most familiar with. If you know them all equally and speed is a real concern, pick C.

Bryan Oakley
To be honest: If he knew them all equally well, then he probably wouldn't need to ask here.
Joachim Sauer
That's correct. I only know Java. The rest I only have a superficial understanding of.
Bijan
+4  A: 

Write it in your preferred language. To me that sounds like python. When you start running the system you can profile it and see where the bottlenecks are. Once you do some basic optimisations if it's still not acceptable you can rewrite portions in C.

A consideration could be writing this in iron python to take advantage of the clr and dlr in .net. Then you can leverage .net 4 and parallel extensions. If anything will give you performance increases it'll be some flavour of threading which .net does extremely well.

Edit:

Just wanted to make this part clear. From the description, it sounds like parallel processing / multithreading is where the majority of the performance gains are going to come from.

Josh Smeaton
+1, great comment, and Jython is also an option for Python on the JVM, if something in Java-space is specifically required.
cjrh
Python is not the answer for everything in this world. In his case I am sure Java / C++ are far better choices.
Andrei Ciobanu
I'm going to hang myself by saying C++ is a terrible choice based on the OP's requirements. If you're after speed, use C. C++ provides so many opportunities to hang yourself if you don't know it well. Java can be a great choice. But if the OP likes using Python, and can use it well, why wouldn't it be the choice for these circumstances if the infrastructure supports the the performance goals?
Josh Smeaton
@Josh: Absolutely - one of Python's great strengths (to a Python and C programmer) is that you can code everything at a high level, and dip down to the low level only when necessary. (+1)
Skilldrick
A: 

I would go with pypy. If not, http://lolcode.com/.

fastcodejava
-1 for lolcode (yes, I know it's a joke, but it's not really helpful).
Skilldrick
+3  A: 

I would pick Java for this task. In terms of RAM, the difference between Java and C++ is that in Java, each Object has an overhead of 8 Bytes (using the Sun 32-bit JVM or the Sun 64-bit JVM with compressed pointers). So if you have millions of objects flying around, this can make a difference. In terms of speed, Java and C++ are almost equal at that scale.

So the more important thing for me is the development time. If you make a mistake in C++, you get a segmentation fault (and sometimes you don't even get that), while in Java you get a nice Exception with a stack trace. I have always preferred this.

In C++ you can have collections of primitive types, which Java hasn't. You would have to use external libraries to get them.

If you have real-time requirements, the Java garbage collector may be a nuisance, since it takes some minutes to collect a 20 GB heap, even on machines with 24 cores. But if you don't create too many temporary objects during runtime, that should be fine, too. It's just that your program can make that garbage collection pause whenever you don't expect it.

Roland Illig
Thank you. I think I am going to go with Java.
Bijan
Can this same argument apply to C#/.NET?
mgroves
+3  A: 

Why only one language for your system? If I were you, I will build the entire system in Python, but C or C++ will be used for performance-critical components. In this way, you will have a very flexible and extendable system with fast-enough performance. You can find even tools to generate wrappers automatically (e.g. SWIG, Cython). Python and C/C++/Java/Fortran are not competing each other; they are complementing.

Daehyok Shin
+4  A: 

While I am a huge fan of Python and personaly I'm not a great lover of Java, in this case I have to concede that Java is the right way to go.

For many projects Python's performance just isn't a problem, but in your case even minor performance penalties will add up extremely quickly. I know this isn't a real-time simulation, but even for batch processing it's still a factor to take into consideration. If it turns out the load is too big for one virtual server, an implementation that's twice as fast will halve your virtual server costs.

For many projects I'd also argue that Python will allow you to develop a solution faster, but here I'm not sure that would be the case. Java has world-class development tools and top-drawer enterprise grade frameworks for parallell processing and cross-server deployment and while Python has solutions in this area, Java clearly has the edge. You also have architectural options with Java that Python can't match, such as Javaspaces.

I would argue that C and C++ impose too much of a development overhead for a project like this. They're viable inthat if you are very familiar with those languages I'm sure it would be doable, but other than the potential for higher performance, they have nothing else to bring to the table.

C# is just a rewrite of Java. That's not a bad thing if you're a Windows developer and if you prefer Windows I'd use C# rather than Java, but if you don't care about Windows there's no reason to care about C#.

Simon Hibbs
A: 

It is useful to look at the inner loop of your numerical code. After all you will spend most of your CPU-time inside this loop.

If the inner loop is a matrix operation, then I suggest python and scipy, but of the inner loop if not a matrix operation, then I would worry about python being slow. (Or maybe I would wrap c++ in python using swig or boost::python)

The benefit of python is that it is easy to debug, and you save a lot of time by not having to compile all the time. This is especially useful for a project where you spend a lot of time programming deep internals.

nielsle