tags:

views:

3473

answers:

24

I'm currently deciding on a platform to build a scientific computational product on, and am deciding on either C#, Java, or plain C with Intels compiler on Core2 Quad CPU's. It's mostly integer arithmetic.

My benchmarks so far show Java and C are about on par with each other, and dotNET/C# trails by about 5%- however a number of my coworkers are claiming that dotNET with the right optimizations will beat both of these given enough time for the JIT to do its work.

I always assume that the JIT would have done it's job within a few minutes of the app starting (Probably a few seconds in my case, as it's mostly tight loops), so I'm not sure whether to believe them

Can anyone shed any light on the situation? Would dotNET beat Java? (Or am I best just sticking with C at this point?).

The code is highly multithreaded and data sets are several terabytes in size.

Haskell/erlang etc are not options in this case as there is a significant quantity of existing legacy C code that will be ported to the new system, and porting C to Java/C# is a lot simpler than to Haskell or Erlang. (Unless of course these provide a significant speedup).

Edit: We are considering moving to C# or Java because they may, in theory, be faster. Every percent we can shave off our processing time saves us tens of thousands of dollars per year. At this point we are just trying to evaluate whether C, Java, or c# would be faster.

+5  A: 

If there is already a significant quantity of legacy C code that will be added to the system then why move to C# and Java?

In response to your latest edit about wanting to take advantage of any improvements in processing speed....then your best bet would be to stick to C as it runs closer to the hardware than C# and Java which have the overhead of a runtime environment to deal with. The closer to the hardware you can get the faster you should be able to run. Higher Level languages such as C# and Java will result in quicker development times...but C...or better yet Assembly will result in quicker processing time...but longer development time.

mezoid
+22  A: 

I'm sorry, but that is not a simple question. It would depend a lot on what exactly was going on. C# is certainly no slouch, and you'd be hard-pressed to say "java is faster" or "C# is faster". C is a very different beast... it maybe has the potential to be faster - if you get it right; but in most cases it'll be about the same, but much harder to write.

It also depends how you do it - locking strategies, how you do the parallelization, the main code body, etc.

Re JIT - you could use NGEN to flatten this, but yes; if you are hitting the same code it sould be JITted very early on.

One very useful feature of C#/Java (over C) is that they have the potential to make better use of the local CPU (optimizations etc), without you having to worry about it.

Also - with .NET, consider things like "Parallel Extensions" (to be bundled in 4.0), which gives you a much stronger threading story (compared to .NET without PFX).

Marc Gravell
+1 for mentioning Parallel Extensions. As has already been pointed out, the number of cores available will increase over time. Parallel Extensions should allow any existing code to make the best use of those cores as they are added without any effort from the developer at all.
Grant Wagner
One very useful feature of C#/Java (over C) is that they have the potential -- Please explain this, how can C#/Java VM/JT make better (i.e. faster) optimizations than a optimizing C compiler that targets the native CPU?
mctylr
@mctylr they do not need to deal with aliasing, they have access to runtime behaviour when optimizing, they have more freedom to screw around with the internals (like Escape Analysis) because unlike c the internals are hidden away. This is very much still potential but it's getting there fast
ShuggyCoUk
+5  A: 

It is going to depend very much on what you are doing specifically. I have Java code that beats C code. I have Java code that is much slower than C++ code (I don't do C#/.NET so cannot speak to those).

So, it depends on what you are doing, I am sure you can find something that is faster in language X than language Y.

Have you tried running the C# code through a profiler to see where it is taking the most time (same with Java and C while you are at it). Perhaps you need to do something different.

The Java HotSpot VM is more mature (roots of it going back to at least 1994) than the .NET one, so it may come down to the code generation abilities of both for that.

TofuBeer
Also Java has better support on diffrent OS (linux).
bwalliser
If your "different OS" is linux, then not really - mono is very well supported there. But certainly, java has better penetration into the more... "obscure" devices. For server work (typically windows or linux/unix), there isn't much difference between C# and Java (in this respect).
Marc Gravell
bwalliser
+1  A: 

I would go with C# (or Java) because your development time will probably be much faster than with C. If you end up needing extra speed then you can always rewrite a section in C and call it as a module.

Nathan Reed
+2  A: 

Actually it is 'Assembly language'.

Dhana
Why was this marked down? Given the strange question, it's an appropriate answer!
Daniel Earwicker
+1 for people marking it down; not a bad answer if you have people with ASM skills.
gatoatigrado
Did he say "I'll do anything to make it faster"? ASM is appropriate for a question like that!
gbarry
Moore's law will make your hardware go faster, and will do it faster than you can code your app.
gbarry
No Moore's Law will make your hardware more *dense* for the same price. It is not and never has been about speed (quantum effects are a bitch)
ShuggyCoUk
+11  A: 

I'm honestly surprised at those benchmarks.

In a computationally intensive product I would place a large wager on C to perform faster. You might write code that leaks memory like a sieve, and has interesting threading related defects, but it should be faster.

The only reason I could think that Java or C# would be faster is due to a short run length on the test. If little or no GC happened, you'll avoid the overhead of actually deallocating memory. If the process is iterative or parallel, try sticking a GC.Collect wherever you think you're done a bunch of objects(after setting things to null or otherwise removing references).

Also, if you're dealing with terabytes of data, my opinion is you're going to be much better off with deterministic memory allocation that you get with C. If you deallocate roughly close to when you allocate your heap will stay largely unfragmented. With a GC environment you may very well end up with your program using far more memory after a decent run length than you would guess, just because of fragmentation.

To me this sounds like the sort of project where C would be the appropriate language, but would require a bit of extra attention to memory allocation/deallocation. My bet is that C# or Java will fail if run on a full data set.

Darren Clark
Your preconceptions are out of date. Modern garbage collectors are very good at optimizing deallocation, especially when it happens close to allocation. They can even outperform C-style malloc/free.
Michael Borgwardt
Could be, what I do know is if I allocate a variable sized buffer(around 8K) in a loop with C# and don't do a GC.Collect I die. Fast. Even when releasing the buffer each iteration. LOH ftl.
Darren Clark
8K isn't enough to get into the LOH as far as I'm aware. Please post a short but complete program to demonstrate the problem.
Jon Skeet
Hrm... You're both right, and I'm wrong here. I have an old program that had problems, but it was allocating more than 8K, and I was misremembering the LOH object size. It also was on 1.0, and I can't repro the problem on 3.5. So Michael's right as well. I'm still surprised at the benchmarks though.
Darren Clark
1.0 LOH was badly broken in many ways, we had similar problems. 1.1 mostly fixed it, 2.0 all problems went away
ShuggyCoUk
A: 

If you are using a highly multithreaded code, I would recommend you to take a look at the upcoming Task Parallel Library (TPL) for .NET and the Parallel Pattern Library (PPL) for native C++ applications. That will save you a lot of issues with thread/dead lockíng and all other issues that you would spend a lot of time digging into and solving for yourself. For my self, I truly believe that the memory management in the managed world will be more efficient and beat the native code in the long term.

Magnus Johansson
+3  A: 

Depends on what kind of application you are writing. Try The Computer Language Benchmarks Game

http://shootout.alioth.debian.org/u32q/benchmark.php?test=all&lang=csharp&lang2=java&box=1 http://shootout.alioth.debian.org/u64/benchmark.php?test=all&lang=csharp&lang2=java&box=1

J-16 SDiZ
Please pay extra attention to the results of C# because they are run on mono .net. I am not say mono is slower than MS .net, but there MAY be a difference in speed.
Canton
A: 

If much of your code is in C why not keep it? In principal and by design it's obvious that C is faster. They may close the gap over time but they always have more level os indirection and "safety". C is fast because it's "unsafe". Just think about bound checking. Interfacing to C is supported in every langauge. And so I can not see why one would not like to just wrap the C code up if it's still working and use it in whatever language you like

Friedrich
You should read up upon what a modern Java JVM can do in terms of optimization. You will be astonished.
Thorbjørn Ravn Andersen
+47  A: 

The key piece of information in the question is this:

Every percent we can shave off our processing time saves us tens of thousands of dollars per year

So you need to consider how much it will cost to shave each percent off. If that optimization effort costs tens of thousands of dollars per year, then it isn't worth doing. You could make a bigger saving by firing a programmer.

With the right skills (which today are rarer and therefore more expensive) you can hand-craft assembler to get the fastest possible code. With slightly less rare (and expensive) skills, you can do almost as well with some really ugly-looking C code. And so on. The more performance you squeeze out of it, the more it will cost you in development effort, and there will be diminishing returns for ever greater effort. If the profit from this stays at "tens of thousands of dollars per year" then there will come a point where it is no longer worth the effort. In fact I would hazard a guess you're already at that point because "tens of thousands of dollars per year" is in the range of one salary, and probably not enough to buy the skills required to hand-optimize a complex program.

I would guess that if you have code already written in C, the effort of rewriting it all as a direct translation in another language will be 90% wasted effort. It will very likely perform slower simply because you won't be taking advantage of the capabilities of the platform, but instead working against them, e.g. trying to use Java as if it was C.

Also within your existing code, there will be parts that make a crucial contribution to the running time (they run frequently), and other parts that are totally irrelevant (they run rarely). So if you have some idea for speeding up the program, there is no economic sense in wasting time applying it to the parts of the program that don't affect the running time.

So use a profiler to find the hot spots, and see where time is being wasted in the existing code.

Update when I noticed the reference to the code being "multithreaded"

In that case, if you focus your effort on removing bottlenecks so that your program can scale well over a large number of cores, then it will automatically get faster every year at a rate that will dwarf any other optimization you can make. This time next year, quad cores will be standard on desktops. The year after that, 8 cores will be getting cheaper (I bought one over a year ago for a few thousand dollars), and I would predict that a 32 core machine will cost less than a developer by that time.

Daniel Earwicker
I agree with less words. Furthermore, I think you can better save lots of money either by utilizing the gpu's if possible and getting help from intel themselves if it's a cool/big project
Mafti
It's not clear that there's going to be enough bandwidth on 32-core machines to do data-intensive computing on all the cores. It might also be worth it to look at distributed-memory scaling, like MapReduce or MPI. This could get him scaling up to thousands of cores *now*. See below.
tgamblin
A: 

Ref; "My benchmarks so far show Java and C are about on par with each other"

Then your benchmarks are severely flawed...

C will ALWAYS be orders of magnitudes faster then both C# and Java unless you do something seriously wrong...!

PS! Notice that this is not an attempt to try to bully neither C# nor Java, I like both Java and C#, and there are other reasons why you would for many problems choose either Java or C# instead of C. But neither Java nor C# would in a correct written tests NEVER be able to perform with the same speed as C...

Edited because of the sheer number of comments arguing against my rhetoric

Compare these two buggers...

C#

public class MyClass
{
   public int x;

   public static void Main()
   {
      MyClass[] y = new MyClass[1000000];
      for( int idx=0; idx < 1000000; idx++)
      {
          y[idx] = new MyClass();
          y[idx].x = idx;
      }
   }
}

against this one (C)

struct MyClass
{
   int x;
}

void Main()
{
   MyClass y[1000000];
   for( int idx = 0; idx < 1000000; idx++)
   {
      y[idx].x = idx;
   }
}

The C# version first of all needs to store its array on the heap. The C version stores the array on the stack. To store stuff on the stack is merely changing the value of an integer value while to store stuff on the heap means finding a big enough chunk of memory and potentially means traversing the memory for a pretty long time.

Now mostly C# and Java allocates huge chunks of memory which they keep on spending till it's out which makes this logic execute faster. But even then to compare this against changing the value of an integer is like an F16 against an oil tanker speedwise...

Second of all in the C version since all those objects are already on the stack we don't need to explicitly create new objects within the loop. Yet again for C# this is a "look for available memory operation" while the C version is a ZIP (do nothing operation)

Third of all is the fact that the C version will automatically delete all these objects when they run out of scope. Yet again this is an operation which ONLY CHANGES THE VALUE OF AN INTEGER VALUE. Which would on most CPU architectures take between 1 and 3 CPU cycles. The C# version doesn't do that, but when the Garbage Collector kicks in and needs to collect those items my guess is that we're talking about MILLIONS of CPU cycles...

Also the C version will instantly become x86 code (on an x86 CPU) while the C# version would first become IL code. Then later when executed it would have to be JIT compiled, which probably alone takes orders of magnitudes longer time then only executing the C version.

Now some wise guy could probably execute the above code and measure CPU cycles. However that's basically no point at all in doing because mathematically it's proven that the Managed Version would probably take several million times the number of CPU cycles as the C version. So my guess is that we're now talking about 5-8 orders of magnitudes slower in this example. And sure, this is a "rigged test" in that I "looked for something to prove my point", however I challenge those that commented badly against me on this post to create a sample which does NOT execute faster in C and which also doesn't use constructs which you normally never would use in C due to "better alternatives" existing.

Note that C# and Java are GREAT languages. I prefer them over C ANY TIME OF THE DAY. But NOT because they're FASTER. Because they are NOT. They are ALWAYS slower then C and C++. Unless you've coded blindfolded in C or C++...

Edit;

C# of course have the struct keyword, which would seriously change the speed for the above C# version, if we changed the C# class to a value type by using the keyword struct instead of class. The struct keyword means that C# would store new objects of the given type on the stack - which for the above sample would increase the speed seriously. Still the above sample happens to also feature an array of these objects.

Even though if we went through and optimized the C# version like this, we would still end up with something several orders of magnitudes slower then the C version...

A good written piece of C code will ALWAYS be faster then C#, Java, Python and whatever-managed-language-you-choose...

As I said, I love C# and most of the work I do today is C# and not C. However I don't use C# because it's faster then C. I use C# because I don't need the speed gain C gives me for most of my problems.

Both C# and Java is though ridiculously slower then C, and C++ for that matter...

Thomas Hansen
Do you have a reference? Once Java/C# code gets JIT'd into native machine-code, I can think of no reason for it to be "orders of magnitude" slower than machine-code compiled from C source.
Blorgbeard
same; writing code that compiles for the common case could potentially outperform the compile-time c strategy.
gatoatigrado
simply because it has to cater for all eventualities - it can;t let you write crappy, memory-leaking, thread-unsafe code. C can. So C doesn't have to have the same kind of safety net, and obviously, all that checking and 'make safe' stuff means it won't be quite as fast as C can be.
gbjbaanb
obviously, for most applications this really doesn't matter, but for this particular application, it sounds like it will.
gbjbaanb
-1: Poor/naïve C will easily be out performed by good .NET/JVM code. If memory allocations dominate the runtime, even good C code maybe out performed (under GC allocations are extremely fast).
Richard
-1 for ridiculous over statement."ALWAYS be orders of magnitude"?
ShuggyCoUk
-1: Faster? Usually. By "orders of magnitude"? Not in your wildest dreams.
Juliet
See http://blogs.msdn.com/ricom/archive/2005/05/19/420158.aspx. It took 5 unmanaged versions and a bug fix to be as fast as the first unoptimized C# version. Only after 6 unmanaged optimizations did C++ beat C#, and it wasn't by orders of magnitude (yes, I'm aware C++ isn't C).
Grant Wagner
And those C# times included the CLR startup time, which would probably be irrelevant in the program described in the original question.
Grant Wagner
C# and Java are *MANAGED* Languages. They both rely on Garbage Collectors. They both need to box and unbox value types. And they have no concept of storing stuff on the stack. To compare them speedwise against C is like comparing an F16 against an oil tanker. Ridicilous...! Read books...!
Thomas Hansen
You really aren't helping yourself. Try changing class to struct in the c# example.... second *current* jvm implementations are capable of doing this transparently for you in certain situations, the CLR is heading that way too. Most real world apps are not normally bound by this anyway
ShuggyCoUk
Incidentally I speak as someone who does sometimes jump through some hoops in c# on my hot path to avoid allocation on the heap, it's not that hard. saying orders of magnitude is just plain wrong, it's hyperbole which you shouldn't be surprised to see shot down by rational sorts that abound here
ShuggyCoUk
you really should consider deleting this one and getting your peer pressure badge. "And they have no concept of storing stuff on the stack" I suggest you look at c# structs, stackalloc and escape analysis...perhaps do some of that reading you suggest others do.
ShuggyCoUk
@ShuggyCoUk - You're right. To use value types here would help, and is probably right. But the above samples would still be orders of magnitudes slower. In the C sample we can easily *COUNT* the CPU cycles. And comparing that against our estimate for the C# version would still be a slaughterhouse...
Thomas Hansen
If you think you can count cpu cycle by looking at code these days you are orely mistaken. I suggest further reading on modern pipelined super scalar cpu's, multi level caching and the compiler techniques used to work with them. Have you even bothered to benchmark?
ShuggyCoUk
Also everytime you edit you just show your lack of knowledge, c# lets you stack alloc which gets you a functionally identical program. The only time java/c# will be killed by languages like c is when the initial start up time matters. the OP is clearly not in that situation.
ShuggyCoUk
-1: I've compared, and easily found cases that C# runs much faster than C BECAUSE of GC. Because in C you have to allocate / free memory one by one, but GC runs batch, extremely optimized operations. Try to run similar codes (not your sample, although I don't think that you sample can be any different). And I also doubt that you know the meaning of "orders of magnitude".
Iravanchi
+1 It's something very brave to hold your point against the horde who simply don't understand C.
Andrei Ciobanu
+7  A: 

Quite some time ago Raymond Chen and Rico Mariani had a series of blog posts incrementally optimising a file load into a dictionary tool. While .NET was quicker early on (i.e. easy to make quick) the C/Win32 approach eventually was significantly faster -- but at considerable complexity (e.g. using custom allocators).

In the end the answer to which is faster will heavily depend on how much time you are willing to expend on eking every microsecond out of each approach. That effort (assuming you do it properly, guided by real profiler data) will make a far greater difference than choice of language/platform.


The first and last performance blog entries (lot os room for error here):

(The last link gives an overall summary of the results and some analysis.)

Richard
+9  A: 

Don't worry about language; parallelize!

If you have a highly multithreaded, data-intensive scientific code, then I don't think worrying about language is the biggest issue for you. I think you should concentrate on making your application parallel, especially making it scale past a single node. This will get you far more performance than just switching languages.

As long as you're confined to a single node, you're going to be starved for compute power and bandwidth for your app. On upcoming many-core machines, it's not clear that you'll have the bandwidth you need to do data-intensive computing on all the cores. You can do computationally intensive work (like a GPU does), but you may not be able to feed all the cores if you need to stream a lot of data to every one of them.

I think you should consider two options:

  1. MapReduce
    Your problem sounds like a good match for something like Hadoop, which is designed for very data-intensive jobs.

    Hadoop has scaled to 10,000 nodes on Linux, and you can shunt your work off either to someone else's (e.g. Amazon's, Microsoft's) or your own compute cloud. It's written in Java, so as far as porting goes, you can either call your existing C code from within Java, or you can port the whole thing to Java.

  2. MPI
    If you don't want to bother porting to MapReduce, or if for some reason your parallel paradigm doesn't fit the MapReduce model, you could consider adapting your app to use MPI. This would also allow you to scale out to (potentially thousands) of cores. MPI is the de-facto standard for computationally intensive, distributed-memory applications, and I believe there are Java bindings, but mostly people use MPI with C, C++, and Fortran. So you could keep your code in C and focus on parallelizing the performance-intensive parts. Take a look at OpenMPI for starters if you are interested.

tgamblin
+1 although never underestimate the amount of time it takes to distribute a complex program, it might turn out to be not worth the effort/cash!
Ed Woodcock
+2  A: 

I participated in a few TopCoder's Marathon matches where performance was they key to victory.

My choice was C#. I think C# solutions placed slightly above Java and were slighly slower than C++... Until somebody wrote a code in C++ that was a order of magnitude faster. You were alowed to use Intel compiler and the winning code was full of SIMD insturctions and you cannot replicate that in C# or Java. But if SIMD is not an option, C# and Java should be good enough as long as you take care to use memory correctly (e.g. watch for cache misses and try to limit memory access to the size of L2 cache)

bh213
Re SIMD: .NET has SIMD support in the Mono implementation via the Mono.Simd namespace.
Konrad Rudolph
And Microsoft has indicated that they would very much like to also expose SIMD functionality. Mono in turn has indicated that they are willing to change their APIs to whatever MS comes up with (although the decent thing for MS to do, would be to grudgingly acknowledge that they missed the boat ...
Jörg W Mittag
... and just adopt the Mono API). Anyway, none of this is going to happen before .NET 5.0 or even later.
Jörg W Mittag
+1  A: 

To reiterate a comment, you should be using the GPU, not the CPU if you are doing arithmetic scientific computing. Matlab with CUDA plugins would be much more awesome than Java or c# if Matlab licensing is not an issue. The nVidia documentation shows how to compile any CUDA function into a mex file. If you need free software, I like pycuda.

If however, GPUs are not an option, I personally like C for a lot of routines because the optimizations the compiler makes are not as complicated as JIT: you don't have to worry about whether a "class" becomes like a "struct" or not. In my experience, problems can usually be broken down such that higher-level things can be written in a very expressive language like Python (rich primitives, dynamic types, incredibly flexible reflection), and transformations can be written in something like C. Additionally, there's neat compiler software, like PLUTO (automatic loop parallelization and OpenMP code generation), and libraries like Hoard, tcmalloc, BLAS (CUBLAS for gpu), etc. if you choose to go the C/C++ route.

gatoatigrado
+5  A: 

You say "the code is multithreaded" which implies that the algorithms are parallelisable. Also, you save the "data sets are several terabytes in size".

Optimising is all about finding and eliminating bottlenecks.

The obvious bottleneck is the bandwidth to the data sets. Given the size of the data, I'm guessing that the data is held on a server rather than on a desktop machine. You haven't given any details of the algortihms you're using. Is the time taken by the algorithm greater than the time taken to read/write the data/results? Does the algorithm work on subsets of the total data?

I'm going to assume that the algorithm works on chunks of data rather than the whole dataset.

You have two scenarios to consider:

  1. The algorithm takes more time to process the data than it does to get the data. In this case, you need to optimise the algorithm.

  2. The algorithm takes less time to process the data than it does to get the data. In this case, you need to increase the bandwitdth between the algorithm and the data.

In the first case, you need a developer that can write good assembler code to get the most out of the processors you're using, leveraging SIMD, GPUs and multicores if they're available. Whatever you do, don't just crank up the number of threads because as soon as the number of threads exceeds the number of cores, your code goes slower! This due to the added overhead of switching thread contexts. Another option is to use a SETI like distributed processing system (how many PCs in your organisation are used for admin purposes - think of all that spare processing power!). C#/Java, as bh213 mentioned, can be an order of magnitude slower than well written C/C++ using SIMD, etc. But that is a niche skillset these days.

In the latter case, where you're limited by bandwidth, then you need to improve the network connecting the data to the processor. Here, make sure you're using the latest ethernet equipment - 1Gbps everywhere (PC cards, switches, routers, etc). Don't use wireless as that's slower. If there's lots of other traffic, consider a dedicated network in parallel with the 'office' network. Consider storing the data closer to the clients - for every five or so clients use a dedicated server connected directly to each client which mirrors the data from the server.

If saving a few percent of processing time saves "tens of thousands of dollars" then seriously consider getting a consultant in, two actually - one software, one network. They should easily pay for themselves in the savings made. I'm sure there's many here that are suitably qualified to help.

But if reducing cost is the ultimate goal, then consider Google's approach - write code that keeps the CPU ticking over below 100%. This saves energy directly and indirectly through reduced cooling, thus costing less. You'll want more bang for your buck so it's C/C++ again - Java/C# have more overhead, overhead = more CPU work = more energy/heat = more cost.

So, in summary, when it comes to saving money there's a lot more to it than what language you're going to choose.

Skizz

Skizz
+1 for bringing the issue of bandwidth to the data sets, rather than focusing on raw computational speed.
Grant Wagner
+3  A: 

One thing to notice is that IF your application(s) would benefit of lazy evaluation a functional programming language like Haskell may yield speedups of a totally different magnitude than the theretically optimal structured/OO code just by not evaluating unnecessary branches.

Also, if you are talking about the monetary benefit of better performance, don't forget to add the cost of maintaing your software into the equation.

ymihere
With F#/Scala you can target CLR/JVM and interoperate with C#/Java easily, allowing functional code to be used where it gives the most benefit.
Richard
Neither F# nor Scala are lazily evaluated, and at least in my experience, they run a tad slower than C# and Java.
Juliet
Lazy evaluation is so big a mindstep that it needs to be tried before you can tell how different it is :)
Thorbjørn Ravn Andersen
+1  A: 

I would consider what everyone else uses - not the folks on this site, but the folks who write the same kind of massively parallel, or super high-performance applications.

I find they all write their code in C/C++. So, just for this fact alone (ie. regardless of any speed issues between the languages), I would go with C/C++. The tools they use and have developed will be of much more use to you if you're writing in the same language.

Aside from that, I've found C# apps to have somewhat less than optimal performance in some areas, multithreading is one. .NET will try to keep you safe from thread problems (probably a good thing in most cases), but this will cause your specific case problems (to test: try writing a simple loop that accesses a shared object using lots of threads. Run that on a single core PC and you get better performance than if you run it on a multiple core box - .net is adding its own locks to make sure you don't muck it up)(I used Jon Skeet's singleton benchmark. The static lock on took 1.5sec on my old laptop, 8.5s on my superfast desktop, the lock version is even worse, try it yourself)

The next point is that with C you tend to access memory and data directly - nothing gets in the way, with C#/Java you will use some of the many classes that are provided. These will be good in the general case, but you're after the best, most efficient way to access this (which, for your case is a big deal with multi-terabytes of data, those classes were not designed with those datasets in mind, they were designed for the common cases everyone else uses), so again, you would be safer using C for this - you'll never get the GC getting clogged up by a class that creates new strings internally when you read a couple of terabytes of data if you write it in C!

So it may appear that C#/Java can give you benefits over a native application, but I think you'll find those benefits are only realised for the kind of line-of-business applications that are commonly written.

gbjbaanb
mctylr
+2  A: 

You question is poorly phrased (or at least the title is) because it implies this difference is endemic and holds true for all instances of java/c#/c code.

Thankfully the body of the question is better phrased because it presents a reasonably detailed explanation of the sort of thing your code is doing. It doesn't state what versions (or providers) of c#/java runtimes you are using. Nor does it state the target architecture or machine the code will run on. These things make big differences.

You have done some benchmarking, this is good. Some suggestions as to why you see the results you do:

  • You aren't as good at writing performant c# code as you are at java/c (this is not a criticism, or even likely but it is a real possibility you should consider)
  • Later versions of the JVM have some serious optimizations to make uncontended locks extremely fast. This may skew things in your favour (And especially the comparison with the c implementation threading primitives you are using)
  • Since the java code seems to run well compared to the c code it is likely that you are not terribly dependent on the heap allocation strategy (profiling would tell you this).
  • Since the c# code runs less well than the java one (and assuming the code is comparable) then several possible reasons exist:
    • You are using (needlessly) virtual functions which the JVM will inline but the CLR will not
    • The latest JVM does Escape Analysis which may make some code paths considerably more efficient (notably those involving string manipulation whose lifetime is stack bound
    • Only the very latest 32 bit CLR will inline methods involving non primitive structs
    • Some JVM JIT compilers use hotspot style mechanisms which attempt to detect the 'hotspots' of the code and spend more effort re-jitting them.

Without an understanding of what your code spends most of its time doing it is impossible to make specific suggestions. I can quite easily write code which performs much better under the CLR due to use of structs over objects or by targeting runtime specific features of the CLR like non boxed generics, this is hardly instructive as a general statement.

ShuggyCoUk
A: 

Note that for heavy computations there is a great advantage in having tight loops which can fit in the CPU's first level cache as it avoids having to go to slower memory repeatedly to get the instructions.

Even for level two cache a large program like Quake IV gets a 10% performance increase with 4 Mb level 2 cache versus 1 Mb level 2 cache - http://www.tomshardware.com/reviews/cache-size-matter,1709-5.html

For these tight loops C is most likely the best as you have the most control of the generated machine code, but for everything else you should go for the platform with the best libraries for the particular task you need to do. For instance the netlib libraries are reputed to have very good performance for a very large set of problems, and many ports to other languages are available.

Thorbjørn Ravn Andersen
A: 

If every percentage will really save you tens of thousands of dollars, then you should bring in a domain expert to help with the project. Well designed and written code with performance considered at the initial stages may be an order of magnitude faster, saving you 90%, or $900,000. I recently found a subtle flaw in some code that sped up a process by over 100 times. A colleague of mine found an algorithm that was running in O(n^3) that he re-wrote to make it O(N log n). This tends to be where the huge performance saving are.

If the problem is so simple that you are certain that a better algorithm cannot be employed giving you significant savings, then C is most likely your best language.

Stephen Nutt
+1  A: 

My preference would be C or C++ because I'm not separated from the machine language by a JIT compiler.

You want to do intense performance tuning, and that means stepping through the hot spots one instruction at a time to see what it is doing, and then tweaking the source code so as to generate optimal assembler.

If you can't get the compiler to generate what you consider good enough assembler code, then by all means write your own assembler for the hot spot(s). You're describing a situation where the need for performance is paramount.

What I would NOT do if I were in your shoes (or ever) is rely on anecdotal generalizations about one language being faster or slower than another. What I WOULD do is multiple passes of intense performance tuning along the lines of THIS and THIS and THIS. I have done this sort of thing numerous times, and the key is to iterate the cycle of diagnosis-and-repair because every slug fixed makes the remaining ones more evident, until you literally can't squeeze another cycle out of that turnip.

Good luck.

Added: Is it the case that there is some seldom-changing configuration information that determines how the bulk of the data is processed? If so, it may be that the program is spending a lot of its time re-interpreting the configuration info to figure out what to do next. If so, it is usually a big win to write a code generator that will read the configuration info and generate an ad-hoc program that can whizz through the data without constantly having to figure out what to do.

Mike Dunlavey
A: 

Surely the answer is to go and buy the latest PC with the most cores/processors you can afford. If you buy one of the latest 2x4 core PCs you will find not only does it have twice as many cores as a quad core but also they run 25-40% faster than the previous generation of processors/machines.

This will give you approximately a 150% speed up. Far more than choosing Java/C# or C. and whats more your get the same again every 18 months if you keep buying in new boxes!

You can sit there for months rewriting you code or I could go down to my local PC store this afternoon and be running faster than all your efforts same day.

Improving code quality/efficiency is good but sometimes implementation dollars are better spent elsewhere.

Tony Lambert
A: 

Writing in one language or another will only give you small speed ups for a large amount of work. To really speed things up you might want to look at the following:

  1. Buying the latest fastest Hardware.
  2. Moving from 32 bit operating system to 64 bit.
  3. Grid computing.
  4. CUDA / OpenCL.
  5. Using compiler optimisation like vectorization.
Tony Lambert