You have to be careful with how you interpret Tim Sweeney's statements in that Ars interview. He's saying that having two separate platforms (the CPU and GPU), one suitable for single-threaded performance and one suitable for throughput-oriented computing, will soon be a thing of the past, as our applications and hardware grow towards one another.
The GPU grew out of technology limitations with the CPU, which made the arguably more natural algorithms like ray-tracing and photon mapping nigh-undoable at reasonable resolutions and framerates. In came the GPU, with a wildly different and restrictive programming model, but maybe 2 or 3 orders of magnitude better throughput for applications painstakingly coded to that model. The two machine models had (and still have) essentially different coding styles, languages (OpenGL, DirectX, shader languages vs. traditional desktop languages), and workflows. This makes code reuse, and even algorithm/programming skill reuse, extremely difficult, and hamstrings any developer who wants to make use of a dense parallel compute substrate into this restrictive programming model.
Finally, we're coming to a point where this dense compute substrate is similarly programmable to a CPU. Although there is still a sizeable performance delta between one "core" of these massively-parallel accelerators (though the threads of execution within, for example, an SM on the G80, are not exactly cores in the traditional sense) and a modern x86 desktop core, two factors drive convergence of these two platforms:
- Intel and AMD are moving towards more, simpler cores on x86 chips, converging the hardware with the GPU, where units are becoming more coarse-grained and programmable over time).
- This and other forces are spawning many new applications that can take advantage of Data- or Thread-Level Parallelism (DLP/TLP), effectively utilizing this kind of substrate.
So, what Tim was saying is that the 2 distinct platforms will converge, to an even greater extent than, for instance, OpenCl, affords. A salient quote from the interview:
TS: No, I see exactly where you're
heading. In the next console
generation you could have consoles
consist of a single non-commodity
chip. It could be a general processor,
whether it evolved from a past CPU
architecture or GPU architecture, and
it could potentially run
everything—the graphics, the AI,
sound, and all these systems in an
entirely homogeneous manner. That's a
very interesting prospect, because it
could dramatically simplify the
toolset and the processes for creating
software.
Right now, in the course of shipping
Unreal 3, we have to use multiple
programming languages. We use one
programming language for writing pixel
shaders, another for writing gameplay
code, and then on PlayStation 3 we use
yet another compiler to write code to
run on the Cell processor. So the
PlayStation 3 ends up being a
particular challenge, because there
you have three completely different
processors from different vendors with
different instruction sets and
different compilers and different
performance techniques. So, a lot of
the complexity is unnecessary and
makes load-balancing more difficult.
When you have, for example, three
different chips with different
programming capabilities, you often
have two of those chips sitting idle
for much of the time, while the other
is maxed out. But if the architecture
is completely uniform, then you can
run any task on any part of the chip
at any time, and get the best
performance tradeoff that way.