I'm working with some number crunching code that, by its nature, is floating-point intensive and and just plain slow. It's research code, so it can be tailored to one architecture, and is running on a Core 2 Quad box. My understanding is that, for the Pentium 4/Netburst architecture, Intel severely stripped down the x87 FPU, and adopted a more SSE2-centric design. This resulted in horrible performance on x87 code. However, the Core 2 architecture is more closely related to the P6 architecture than Netburst.
My compiler does not target SSE at all AFAIK and my understanding is that very few compilers do this well. Furthermore, I am using the D language, which is fairly bleeding edge, so there just aren't many compilers available for it. However, I don't want to switch languages, both because of the inertia of my existing code and because, despite its immaturity, I really like D.
Does the Core 2 architecture also have a stripped down x87 FPU? If so, what is the best way around this?