I need some idea how to write a C++ cross platform implementation of a few parallelizable problems in a way so I can take advantage of SIMD (SSE, SPU, etc) if available. As well as I want to be able at run time to switch between SIMD and not SIMD.
How would you suggest me to approach this problem? (Of course I don't want to implement the problem multiple times for all possible options)
I can see how this might not be very easy task with C++ but I believe that I'm missing something. So far my idea looks like this... A class cStream will be array of a single field. Using multiple cStreams I can achieve SoA (Structure of Arrays). Then using a few Functors I can fake Lambda function that I need to be executed over the whole cStream.
// just for example I'm not expecting this code to compile
cStream a; // something like float[1024]
cStream b;
cStream c;
void Foo()
{
for_each(
AssignSIMD(c, MulSIMD(AddSIMD(a, b), a)));
}
Where for_each will be responsible for incrementing the current pointer of the streams as well as inlining the functors' body with SIMD and without SIMD.
something like so:
// just for example I'm not expecting this code to compile
for_each(functor<T> f)
{
#ifdef USE_SIMD
if (simdEnabled)
real_for_each(f<true>()); // true means use SIMD
else
#endif
real_for_each(f<false>());
}
Notice that if the SIMD is enabled is checked once and that the loop is around the main functor.