views:

234

answers:

5

Short version: I'm wondering if it's possible, and how best, to utilise CPU specific instructions within a DLL?

Slightly longer version: When downloading (32bit) DLLs from, say, Microsoft it seems that one size fits all processors.

Does this mean that they are strictly built for the lowest common denominator (ie. the minimum platform supported by the OS)? Or is there some technique that is used to export a single interface within the DLL but utilise CPU specific code behind the scenes to get optimal performance? And if so, how is it done?

+2  A: 

The DLL is expected to work on every computer WIN32 runs on, so you are stuck to the i386 instruction set in general. There is no official method of exposing functionality/code for specific instruction sets. You have to do it by hand and transparently.

The technique used basically is as follows: - determine CPU features like MMX, SSE in runtime - if they are present, use them, if not, have fallback code ready

Because you cannot let your compiler optimise for anything else than i386, you will have to write the code using the specific instruction sets in inline assembler. I don't know if there are higher-language toolkits for this. Determining the CPU features is straight forward, but could also need to be done in assembler.

ypnos
You could use preprocessor magic to compile the same .C file for different CPUs (giving the functions different names, e.g.) and then link the different .OBJ files together. You'd need some logic to decide which function to call, though.
Roger Lipscombe
+5  A: 

I don't know of any standard technique but if I had to make such a thing, I would write some code in the DllMain() function to detect the CPU type and populate a jump table with function pointers to CPU-optimized versions of each function.

There would also need to be a lowest common denominator function for when the CPU type is unknown.

You can find current CPU info in the registry here:

HKEY_LOCAL_MACHINE\HARDWARE\DESCRIPTION\System\CentralProcessor
Adam Pierce
+1  A: 

An easy way to get the SSE/SSE2 optimizations is to just use the /arch argument for MSVC. I wouldn't worry about fallback--there is no reason to support anything below that unless you have a very niche application.

http://msdn.microsoft.com/en-us/library/7t5yh4fd.aspx

I believe gcc/g++ have equivalent flags.

Nick
+1  A: 

DLLs you download from Microsoft are targeted for the generic x86 architecture for the simple reason that it has to work across all the multitude of machines out there.

Until the Visual Studio 6.0 time frame (I do not know if it has changed) Microsoft used to optimize its DLLs for size rather than speed. This is because the reduction in the overall size of the DLL gave a higher performance boost than any other optimization that the compiler could generate. This is because speed ups from micro optimization would be decidedly low compared to speed ups from not having the CPU wait for the memory. True improvements in speed come from reducing I/O or from improving the base algorithm.

Only a few critical loops that run at the heart of the program could benefit from micro optimizations simply because of the huge number of times they are invoked. Only about 5-10% of your code might fall in this category. You could rest assured that such critical loops would already be optimized in assembler by the Microsoft software engineers to some level and not leave much behind for the compiler to find. (I know it's expecting too much but I hope they do this)

As you can see, there would be only drawbacks from the increased DLL code that includes additional versions of code that are tuned for different architectures when most of this code is rarely used / are never part of the critical code that consumes most of your CPU cycles.

+1  A: 

Intel's ICC can compile code twice, for different architectures. That way, you can have your cake and eat it. (OK, you get two cakes - your DLL will be bigger). And even MSVC2005 can do it for very specific cases (E.g. memcpy() can use SSE4)

There are many ways to switch between different versions. A DLL is loaded, because the loading process needs functions from it. Function names are converted into addresses. One solution is to let this lookup depend on not just function name, but also processor features. Another method uses the fact that the name to address function uses a table of pointers in an interim step; you can switch out the entire table. Or you could even have a branch inside critical functions; so foo() calls foo__sse4 when that's faster.

MSalters