views:

616

answers:

4

We've always been an Intel shop. All the developers use Intel machines, recommended platform for end users is Intel, and if end users want to run on AMD it's their lookout. Maybe the test department had an AMD machine somewhere to check we didn't ship anything completely broken, but that was about it.

Up until a few of years ago we just used the MSVC compiler and since it doesn't really offer a lot of processor tuning options beyond SSE level, noone worried too much about whether the code might favour one x86 vendor over another. However, more recently we've been using the Intel compiler a lot. Our stuff definitely gets some significant performance benefits from it (on our Intel hardware), and its vectorization capabilities mean less need to go to asm/intrinsics. However people are starting to get a bit nervous about whether the Intel compiler may actually not be doing such a good job for AMD hardware. Certainly if you step into the Intel CRT or IPP libraries you see a lot of cpuid queries to apparently set up jump tables to optimised functions. It seems unlikely Intel go to much trouble to do anything good for AMDs chips though.

Can anyone with any experience in this area comment on whether it's a big deal or not in practice ? (We've yet to actually do any performance testing on AMD ourselves).

Update 2010-01-04: Well the need to support AMD never became concrete enough for me to do any testing myself. There are some interesting reads on the issue here, here and here though.

Update 2010-08-09: It seems the Intel-FTC settlement has something to say about this issue - see "Compilers and Dirty Tricks" section of this article.

+9  A: 

Buy an AMD box and run it on that. That seems like the only responsible thing to do, rather than trusting strangers on the internet ;)

Apart from that, I believe part of AMD's lawsuit against Intel is based on the claim that Intel's compiler specifically produces code that runs inefficiently on AMD processors. I don't know whether that's true or not, but AMD seems to believe so.

But even if they don't willfully do that, there's no doubt that Intel's compiler optimizes specifically for Intel processors and nothing else.

When that is said, I doubt it'd make a huge difference. AMD CPU's would still benefit from all the auto-vectorization and other clever features of the compiler.

jalf
A: 

I'm surely stating the obvious, if performance is crucial for your application, then you'd better do some testing - on all combinations of hardware/compiler. There are no guarantees. As outsiders, we can only give you our guesses/biases. Your software may have unique characteristics that are unlike what we've seen.

My experience:

I used to work at Intel, and developed an in-house (C++) application where performance was critical. We tried to use Intel's C++ compiler, and it always under performed gcc - even after doing profile runs, recompiling using the profiled information (which icc supposedly uses to optimize) and re-running on the exact same dataset (this was in 2005-2007, things may be different now). So, based on my experience, you might want to try gcc (in addition to icc and MSVC), it's possible you will get better performance that way and side-step the question. It shouldn't be too hard to switch compilers (if your build process is reasonable).

Now I work at a different company, and the IT folks do extensive hardware testing, and for a while Intel and AMD hardware was relatively comparable, but the latest generation of Intel hardware significantly out-performed the AMD. As a result, I believe they purchased significant amounts of Intel CPUs and recommend the same for our customers who run our software.

But, back to the question as to whether the Intel compiler specifically targets AMD hardware to run slowly. I doubt Intel bothers with that. It could be that certain optimizations that use knowledge about the internals of Intel CPU architecture or chipsets could run slower on AMD hardware, but I doubt they specifically target AMD hardware.

Trey Jackson
+2  A: 

Sorry if you hit my general button.

This is on the subject of low-level optimization, so it only matters for code that 1) the program counter spends much time in, and 2) the compiler actually sees. For example, if the PC spends most of its time in library routines that you don't compile, it shouldn't matter very much.

Whether or not conditions 1 & 2 are met, here's my experience of how optimization goes:

Several iterations of sampling and fixing are done. In each of these, a problem is identified and most often it is not about where the program counter is. Rather it is that there are function calls at mid-levels of the call stack that, since performance is paramount, could be replaced. To find them quickly, I do this.

Keep in mind that if there is a function call instruction that is on the stack for a significant fraction of execution time, whether in a few long invocations, or a great many short ones, that call is responsible for that fraction of time, so removing it or executing it less often can save a lot of time. And, that savings far exceeds any low-level optimization.

The program can now be many times faster than it was to begin with. I've never seen any good-sized program, no matter how carefully written, that could not benefit from this process. If the process has not been done, it should not be assumed that low-level optimization is the only way to speed up the program.

After this process has been done to the point where it simply can't be done any more, and if samples show that the PC is in code that the compiler sees, then the low-level optimization can make a difference.

Mike Dunlavey
I don't see that this is applicable, though. The question is how code compiled with the Intel compiler works on AMD processors, not how to optimize by hand.
David Thornley
@David: timday's question was "how much should he worry", so that is what I was trying to answer. Yes for hotspot-type code he should worry. (http://www.agner.org/optimize/blog/read.php?i=49) I was trying to convey that hotspot-type code is rarer than one might think.
Mike Dunlavey
+2  A: 

What we have seen is that wherever the Intel compiler must make a runtime choice about the available instruction set, if it does not recognize an Intel CPU, it goes in their "standard" code (which, as you might expect, may not be optimal).

Note that even if I used the word "compiler" above, this mainly happens in their supplied (pre-compiled) libraries and intrinsics that check the instruction set and call the best code.

Juice