You need the right algorithm to be successful in GCJ. Also I'd argue it's much more important how fast you can code a program in the language than how fast it is - provided the limited coding time allowed.
I used Python for GCJ and did not have a case in which the language speed "failed" me. One can say that Python is 2x faster than Ruby (per lang.benchmarks shootout); and when i used Psyco (JIT compiler module) i get about 5x speed-up - but that's small beer, the choice of language can bring only linear speed increase. Say 10 times, big whoops.
The problems in GCJ on the other hand are designed with combinatorial explosion in mind and the larger inputs lead to much larger increase of time (or memory) needed.
Taking for example GCJ 2010-1C, "Making Chess Boards". Assuming square board for simplicity, the naive implementation has complexity O(n*4). The fast yet complicated judge implementation was described as O(n*2 log(n*2)). My simpler solution - which unfortunately came to me way after the round ended - is O(n*3). The difference between power of 3 to power of 4 might not seem significant but in the large input there was 512x512 table to process, for it the 3 algorithms will have to do iterations in the magnitude of
naive 68,719,476,736
mine 134,217,728
judge's 4,718,592
So my implementation on that input will be roughly 30x slower than judge's solution and ~ 500x faster than the naive code. On my oldish desktop (1.2GHz Athlon) my python code runs the large input for slightly under 4 mins. I can speculate that the optimal solution would have run in under 10 seconds - but who cares, as long as you fit in under 8 mins?
On the other hand, the n**4 algorithm would take ~500*4 min = 33 hours to run. Which is very much not acceptable and no optimizing compiler or over-clocked CPU are likely to save us from that morass.
Sure, some optimizations are possible - just adding psyco.full() decreased my runtime 5x to 46 sec. Also running the code on my faster laptop (2GHz dual core) sped it up 3 times. That's 15 times "only" - never mind let's say we sped it up 50x - that is still 10 times too slow to be able to use the "naive" algorithm on the large input.
So if you have bad algorithm, there is no optimization/compiler/hardware to help. On the other hand, if you have the best algorithm in mind, you can use computer 30 times slower than my nine-year old PC and still get results on time with scripting language of the rank of python/ruby. Which by the way is main goal of GCJ problem writers - contestants distinguishing themselves on base of programming skills and not compiler/hardware/network connection.