views:

108

answers:

3

Hello!

Does anyone know how to speed up boost::numeric::ublas::vector?

I am using typedef ublas::vector<float, ublas::bounded_array<float, 3> > MYVECTOR3 and compare it's speed to D3DXVECTOR3 on plain operations.

The test look the following way:

#include <d3dx9.h>
#pragma comment(lib, "d3dx9.lib")

static const size_t kRuns = static_cast<size_t>(10e6);

TEST(Performance, CStyleVectors) {

   D3DXVECTOR3 a(1.0f, 2.0f, 3.0f);
   D3DXVECTOR3 b(2.0f, 3.0f, 1.0f);
   D3DXVECTOR3 c(6.0f, 4.0f, 5.0f);

   for (size_t i = 0; i < kRuns; ++i) {
      c = c + (a + b) * 0.5f;
   }
}

#include <boost/numeric/ublas/vector.hpp>

TEST(Performance, CppStyleVectors) {

   typedef boost::numeric::ublas::vector<float, 
      boost::numeric::ublas::bounded_array<float, 3> > MYVECTOR3;

   MYVECTOR3 a(3), b(3), c(3);
   a[0] = 1.0f, a[1] = 2.0f, a[2] = 3.0f;
   b[0] = 2.0f, b[1] = 3.0f, b[2] = 1.0f;
   c[0] = 6.0f, c[1] = 4.0f, c[2] = 5.0f;

   for (size_t i = 0; i < kRuns; ++i) {
      noalias(c) = c + (a + b) * 0.5f;
   }
}

And the results are the following:

[----------] 2 tests from Performance
[ RUN      ] Performance.CStyleVectors
[       OK ] Performance.CStyleVectors (484 ms)
[ RUN      ] Performance.CppStyleVectors
[       OK ] Performance.CppStyleVectors (9406 ms)
[----------] 2 tests from Performance (9890 ms total)

As you can see, plain C-style vector is about 20 times faster than one from boost::numeric::ublas even when using custom stack-based allocator. Does somebody have any idea on how I could speed it up?

Maybe by writing a custom wrapper or something like that?

Thank you

+3  A: 

Boost uBLAS (and BLAS in general) provides support for vector and matrix algebra, where number of dimensions is determined in runtime. It is suitable for solving certain numerical problem (like simulation with FEM or similar method, optimization problems, approximation). For these problems it's relatively fast but cannot compete in performance with specialized 3d vector class library on its turf.

Use some other library. If D3DXVECTOR3 is not enough, checkout e.g. CGAL.

Tomek Szpakowicz
[VMMlib](http://vmmlib.sourceforge.net/) might also be a good alternative.
greyfade
A: 

I think it is possible that you will get better performance if you inherit from the ublas::vector class into a specialized 3D vector class with a hand coded copy constructor, etc. Something like this code ( which uses doubles )

/**

  A 3D vector

*/
class c3d : public boost::numeric::ublas::bounded_vector<double, 3>
{
    typedef boost::numeric::ublas::bounded_vector<double, 3> Base_vector;
public:

    //  ctors
    c3d () : Base_vector()
    {}
    c3d (double x, double y, double z) : Base_vector()
    { Base_vector::iterator p = begin(); *p++=x; *p++=y; *p++=z;}
    template <class R> c3d (const boost::numeric::ublas::vector_expression<R>& r) : Base_vector(r)
    {}
    template <class R> void operator=(const boost::numeric::ublas::vector_expression<R>& r)
    { Base_vector::operator=(r); }
    template <class R> void operator=(const Base_vector& r)
    { Base_vector::operator=(r); }
ravenspoint
"It should be noted that this only changes the storage uBLAS uses for the vector3. uBLAS will still use all the same algorithm (which assume a variable size) to manipulate the vector3. In practice this seems to have no negative impact on speed. The above runs just as quickly as a hand crafted vector3 which does not use uBLAS. The only negative impact is that the vector3 always store a "size" member which in this case is redundant. "http://www.crystalclearsoftware.com/cgi-bin/boost_wiki/wiki.pl?Effective_UBLAS
ravenspoint
A: 

I took another look at this and realised that the best way to speed it up would be to rewrite

   for (size_t i = 0; i < kRuns; ++i) {
      c = c + (a + b) * 0.5f;
   }

as

c = c + kRuns * ( a + b ) * 0.5f

which takes no time at all.

When I hand code it using simple arrays, my optimizing compiler obviously does this for the 'loop' runs a million times in a time too short to measure.

   float a[3], b[3], c[3];
   a[0] = 1.0f, a[1] = 2.0f, a[2] = 3.0f;
   b[0] = 2.0f, b[1] = 3.0f, b[2] = 1.0f;
   c[0] = 6.0f, c[1] = 4.0f, c[2] = 5.0f;

   for (size_t i = 0; i < KRUNS; ++i) {
       c[0] = c[0] + ( a[0] + b[0] ) * 0.5;
       c[1] = c[1] + ( a[1] + b[1] ) * 0.5;
       c[2] = c[2] + ( a[2] + b[2] ) * 0.5;
   }

Doesn't yours?

Using the ublas library prevents the optimizer doing its thing. Running this code

   #define KRUNS 1000000
   typedef boost::numeric::ublas::vector<float, 
      boost::numeric::ublas::bounded_array<float, 3> > MYVECTOR3;

   MYVECTOR3 a(3), b(3), c(3);
   a[0] = 1.0f, a[1] = 2.0f, a[2] = 3.0f;
   b[0] = 2.0f, b[1] = 3.0f, b[2] = 1.0f;
   c[0] = 6.0f, c[1] = 4.0f, c[2] = 5.0f;

   for (size_t i = 0; i < KRUNS; ++i) {
      noalias(c) = c + (a + b) * 0.5f;
   }

takes 63 milliseconds. I cannot imagine why it should take 9400 milliseconds for you, no matter how slow your machine. I have ask again: are you sure you have switched on optimization and are linking to release libraries?

ravenspoint