views:

1961

answers:

12

I am trying to compare the performance of boost::multi_array to native dynamically allocated arrays, with the following test program:

#include <windows.h>
#define _SCL_SECURE_NO_WARNINGS
#define BOOST_DISABLE_ASSERTS 
#include <boost/multi_array.hpp>

int main(int argc, char* argv[])
{
    const int X_SIZE = 200;
    const int Y_SIZE = 200;
    const int ITERATIONS = 500;
    unsigned int startTime = 0;
    unsigned int endTime = 0;

    // Create the boost array
    typedef boost::multi_array<double, 2> ImageArrayType;
    ImageArrayType boostMatrix(boost::extents[X_SIZE][Y_SIZE]);

    // Create the native array
    double *nativeMatrix = new double [X_SIZE * Y_SIZE];

    //------------------Measure boost----------------------------------------------
    startTime = ::GetTickCount();
    for (int i = 0; i < ITERATIONS; ++i)
    {
        for (int y = 0; y < Y_SIZE; ++y)
        {
            for (int x = 0; x < X_SIZE; ++x)
            {
                boostMatrix[x][y] = 2.345;
            }
        }
    }
    endTime = ::GetTickCount();
    printf("[Boost] Elapsed time: %6.3f seconds\n", (endTime - startTime) / 1000.0);

    //------------------Measure native-----------------------------------------------
    startTime = ::GetTickCount();
    for (int i = 0; i < ITERATIONS; ++i)
    {
        for (int y = 0; y < Y_SIZE; ++y)
        {
            for (int x = 0; x < X_SIZE; ++x)
            {
                nativeMatrix[x + (y * X_SIZE)] = 2.345;
            }
        }
    }
    endTime = ::GetTickCount();
    printf("[Native]Elapsed time: %6.3f seconds\n", (endTime - startTime) / 1000.0);

    return 0;
}

I get the following results:

[Boost] Elapsed time: 12.500 seconds
[Native]Elapsed time:  0.062 seconds

I can't believe multi_arrays are that much slower. Can anyone spot what I am doing wrong?

I assume caching is not an issue since I am doing writes to memory.

EDIT: This was a debug build. Per Laserallan's suggest I did a release build:

[Boost] Elapsed time:  0.266 seconds
[Native]Elapsed time:  0.016 seconds

Much closer. But 16 to 1 still seems to high to me.

Well, no definitive answer, but I'm going to move on and leave my real code with native arrays for now.

Accepting Laserallan's answer because it was the biggest flaw in my test.

Thanks to all.

+5  A: 

Are you building release or debug?

If running in debug mode, the boost array might be really slow because their template magic isn't inlined properly giving lots of overhead in function calls. I'm not sure how multi array is implemented though so this might be totally off :)

Perhaps there is some difference in storage order as well so you might be having your image stored column by column and writing it row by row. This would give poor cache behavior and may slow down things.

Try switching the order of the X and Y loop and see if you gain anything. There is some info on the storage ordering here: http://www.boost.org/doc/libs/1_37_0/libs/multi_array/doc/user.html

EDIT: Since you seem to be using the two dimensional array for image processing you might be interested in checking out boosts image processing library gil.

It might have arrays with less overhead that works perfectly for your situation.

Laserallan
Switching X and Y made no difference. Thanks.
Steve Fallows
I am in fact doing image processing. Thanks for the link. I'll check it out.
Steve Fallows
+1  A: 

Another thing to try is to use iterators instead of a straight index for the boost array.

bradtgmurray
Thanks for the idea - I'll test it. But as my ultimate goal is to refactor some code with native arrays to use boost, I'd rather not change all the existing [][] notation.
Steve Fallows
+2  A: 

I would have expected multiarray to be just as efficient. But I'm getting similar results on a PPC Mac using gcc. I also tried multiarrayref, so that both versions were using the same storage with no difference. This is good to know, since I use multiarray in some of my code, and just assumed it was similar to hand-coding.

KeithB
+1  A: 

I think I know what the problem is...maybe.

In order for the boost implementation to have a syntax like: matrix[x][y]. that means that matrix[x] has to return a reference to an object which acts like a 1D array column, at which point reference[y] gives you your element.

The problem here is that you are iterating in row major order (which is typical in c/c++ since native arrays are row major IIRC. The compiler has to re-execute matrix[x] for each y in this case. If you iterated in column major order when using the boost matrix, you may see better performance.

Just a theory.

EDIT: on my linux system (with some minor changes) I tested my theory, and did show some performance improvement by switching x and y, but it was still slower than a native array. This might be a simple issue of the compiler not being able to optimize away the temporary reference type.

Evan Teran
+3  A: 

Your test is flawed.

  • In a DEBUG build, boost::MultiArray lacks the optimization pass that it sorely needs. (Much more than a native array would)
  • In a RELEASE build, your compiler will look for code that can be removed outright and most of your code is in that category.

What you're likely seeing is the result of your optimizing compiler seeing that most or all of your "native array" loops can be removed. The same is theoretically true of your boost::MultiArray loops, but MultiArray is probably complex enough to defeat your optimizer.

Make this small change to your testbed and you'll see more true-to-life results: Change both occurances of " = 2.345 " with " *= 2.345 " and compile again with optimizations. This will prevent your compiler from discovering that the outer loop of each test is redundant.

I did it and got a speed comparison closer to 2:1.

Shmoopty
I don't believe the native code is being optimized out. Added a print of an arbitrary element in the array after the loop. No change to native time.
Steve Fallows
I clarified my description, Steve. Hope that helps. Even with an arbitrary print, the loop has to run once, not 500 times.
Shmoopty
Good point. But strangely, I don't get any change in time from going to *= 2.345.
Steve Fallows
Playing "outsmart the compiler" can be tricky. The change I mentioned worked with gcc4, but it looks like you're using VC++. The arbitrary print you added...try putting that inside the "500" loop. All told, I got the same 16:1 difference you describe until I fooled the optimizer.
Shmoopty
I did two things to verify that ths full loop is running: changed size to 2x X 2y X 5 iter and observed 10 breakpoint stops; changed calculation to set each element to x * y * i and verified last element. It seems to be running the full loop.
Steve Fallows
A: 

Build in release mode, use objdump, and look at the assembly. They may be doing completely different things, and you'll be able to see which optimizations the compiler is using.

bradtgmurray
+3  A: 

Consider using Blitz++ instead. I tried out Blitz, and its performance is on par with C-style array!

Check out your code with Blitz added below:


#include <windows.h>
#define _SCL_SECURE_NO_WARNINGS
#define BOOST_DISABLE_ASSERTS 
#include <boost/multi_array.hpp>
#include <blitz/array.h>

int main(int argc, char* argv[])
{
    const int X_SIZE = 200;
    const int Y_SIZE = 200;
    const int ITERATIONS = 500;
    unsigned int startTime = 0;
    unsigned int endTime = 0;

    // Create the boost array
    typedef boost::multi_array<double, 2> ImageArrayType;
    ImageArrayType boostMatrix(boost::extents[X_SIZE][Y_SIZE]);


    //------------------Measure boost----------------------------------------------
    startTime = ::GetTickCount();
    for (int i = 0; i < ITERATIONS; ++i)
    {
        for (int y = 0; y < Y_SIZE; ++y)
        {
            for (int x = 0; x < X_SIZE; ++x)
            {
                boostMatrix[x][y] = 2.345;
            }
        }
    }
    endTime = ::GetTickCount();
    printf("[Boost] Elapsed time: %6.3f seconds\n", (endTime - startTime) / 1000.0);

    //------------------Measure blitz-----------------------------------------------
    blitz::Array<double, 2> blitzArray( X_SIZE, Y_SIZE );
    startTime = ::GetTickCount();
    for (int i = 0; i < ITERATIONS; ++i)
    {
        for (int y = 0; y < Y_SIZE; ++y)
        {
            for (int x = 0; x < X_SIZE; ++x)
            {
                blitzArray(x,y) = 2.345;
            }
        }
    }
    endTime = ::GetTickCount();
    printf("[Blitz] Elapsed time: %6.3f seconds\n", (endTime - startTime) / 1000.0);


    //------------------Measure native-----------------------------------------------
    // Create the native array
    double *nativeMatrix = new double [X_SIZE * Y_SIZE];

    startTime = ::GetTickCount();
    for (int i = 0; i < ITERATIONS; ++i)
    {
        for (int y = 0; y < Y_SIZE; ++y)
        {
            for (int x = 0; x < X_SIZE; ++x)
            {
                nativeMatrix[x + (y * X_SIZE)] = 2.345;
            }
        }
    }
    endTime = ::GetTickCount();
    printf("[Native]Elapsed time: %6.3f seconds\n", (endTime - startTime) / 1000.0);



    return 0;
}


Here's the result in debug and release.

DEBUG:

Boost 2.093 secs

Blitz 0.375 secs

Native 0.078 secs

RELEASE:

Boost 0.266 secs

Blitz 0.016 secs

Native 0.015 secs

I used MSVC 2008 SP1 compiler for this.

Can we now say good-bye to C-stlye array? =p

ShaChris23
Thanks for the suggestion. I'll look into blitz...
Steve Fallows
Just a note: you shouldnt simply download version .9 from their website. The best way is to check out from their CVS repository. The instruction is here: http://sourceforge.net/cvs/?group_id=63961Seems like you are developing for Windows platform; in that case, look for Blitz-VS2005.zip.
ShaChris23
+2  A: 

I am wondering two things:

1) bounds check: define the BOOST_DISABLE_ASSERTS preprocessor macro prior to including multi_array.hpp in your application. This turns off bound checking. not sure if this this is disables when NDEBUG is.

2) base index: MultiArray can index arrays from bases different from 0. That means that multi_array stores a base number (in each dimension) and uses a more complicated formula to obtain the exact location in memory, I am wondering if it is all about that.

Otherwise I don't understand why multiarray should be slower than C-arrays.

both of those points are good observations.
ShaChris23
+1  A: 

ShaChris23's blitz timing code seems to have the loop orders incorrect for blitz and boost. For me, flipping the y and x loops only for boost/blitz produces better results making all three roughly equivalent.

Rhys Ulerich
+1  A: 

A similar question was asked and answered here:

http://www.codeguru.com/forum/archive/index.php/t-300014.html

The short answer is that it is easiest for the compiler to optimize the simple arrays, and not so easy to optimize the Boost version. Hence, a particular compiler may not give the Boost version all the same optimization benefits.

Compilers can also vary in how well they will optimize vs. how conservative they will be (e.g. with templated code or other complications).

Eric
A: 

I modified the above code in visual studio 2008 v9.0.21022 and applied the container routines from the Numerical Recipe routines for C and C++

http://www.nrbook.com/nr3/ using their licensed routines dmatrix and MatDoub respectively

dmatrix uses the out of date syntax malloc operator and is not recommended... MatDoub uses the New command

The speed in seconds are in Release version:

Boost: 0.437

Native: 0.032

Numerical Recipes C: 0.031

Numerical recipes C++: 0.031

So from the above blitz looks like the best free alternative.

Ahmos Sansom
+1  A: 

I tested on a Snow Leopard Mac OS using gcc 4.2.1

Debug:
[Boost] Elapsed time:  2.268 seconds
[Native]Elapsed time:  0.076 seconds

Release:
[Boost] Elapsed time:  0.065 seconds
[Native]Elapsed time:  0.020 seconds

Here, is the code (modified so that it can be compiled on Unix):

#define BOOST_DISABLE_ASSERTS
#include <boost/multi_array.hpp>
#include <ctime>

int main(int argc, char* argv[])
{
    const int X_SIZE = 200;
    const int Y_SIZE = 200;
    const int ITERATIONS = 500;
    unsigned int startTime = 0;
    unsigned int endTime = 0;

    // Create the boost array
    typedef boost::multi_array<double, 2> ImageArrayType;
    ImageArrayType boostMatrix(boost::extents[X_SIZE][Y_SIZE]);

    // Create the native array
    double *nativeMatrix = new double [X_SIZE * Y_SIZE];

    //------------------Measure boost----------------------------------------------
    startTime = clock();
    for (int i = 0; i < ITERATIONS; ++i)
    {
        for (int y = 0; y < Y_SIZE; ++y)
        {
            for (int x = 0; x < X_SIZE; ++x)
            {
                boostMatrix[x][y] = 2.345;
            }
        }
    }
    endTime = clock();
    printf("[Boost] Elapsed time: %6.3f seconds\n", (endTime - startTime) / (double)CLOCKS_PER_SEC);

    //------------------Measure native-----------------------------------------------
    startTime = clock();
    for (int i = 0; i < ITERATIONS; ++i)
    {
        for (int y = 0; y < Y_SIZE; ++y)
        {
            for (int x = 0; x < X_SIZE; ++x)
            {
                nativeMatrix[x + (y * X_SIZE)] = 2.345;
            }
        }
    }
    endTime = clock();
    printf("[Native]Elapsed time: %6.3f seconds\n", (endTime - startTime) / (double)CLOCKS_PER_SEC);

    return 0;
}
celil