views:

347

answers:

6

For one of the projects I'm doing right now, I need to look at the performance (amongst other things) of different concurrent enabled programming languages.

At the moment I'm looking into comparing stackless python and C++ PThreads, so the focus is on these two languages, but other languages will probably be tested later. Ofcourse the comparison must be as representative and accurate as possible, so my first thought was to start looking for some standard concurrent/multi-threaded benchmark problems, alas I couldn't find any decent or standard, tests/problems/benchmarks.

So my question is as follows: Do you have a suggestion for a good, easy or quick problem to test the performance of the programming language (and to expose it's strong and weak points in the process)?

+2  A: 

Surely you should be testing hardware and compilers rather than a language for concurrency performance?

I would be looking at a language from the point of view of how easy and productive it is in terms of concurrency and how much it 'insulates' the programmer from making locking mistakes.

EDIT: from past experience as a researcher designing parallel algorithms, I think you will find in most cases the concurrent performance will depend largely on how an algorithm is parallelised, and how it targets the underlying hardware.

Also, benchmarks are notoriously unequal; this is even more so in a parallel environment. For instance, a benchmark that 'crunches' very large matrices would be suited to a vector pipeline processor, whereas a parallel sort might be better suited to more general purpose multi core CPUs.

These might be useful:

Parallel Benchmarks

NAS Parallel Benchmarks

Mitch Wheat
You've tickled my interest!You say that you have experience as a researcher in parallel algorithms. Surely you can give me some interesting problems or algorithms you have worked on? And yes you are correct, benchmarks have very little meaning on their own,that's why I won't spend much time on them
Sven
It's been a while! I basically took an efficient sequential algorithm, and measured it's performance first sequentially and then against an increasing no. of CPUs.
Mitch Wheat
(as in the parallelised version...)
Mitch Wheat
A: 

Surely you should be testing hardware and compilers rather than a language for concurrency performance?

No, hardware and compilers are irrelevant for my testing purposes. I'm just looking for some good problems that can test how well code, written in one language, can compete against code from another language. I'm really testing the constructs available in the specific languages to do concurrent programming. And one of the criteria is performance (measured in time).

Some of the other test criteria I'm looking for are:

  • how easy is it to write correct code; because as we all know concurrent programming is harder then writing single threaded programs
  • what is the technique used to to concurrent programming: event-driven, actor based, message parsing, ...
  • how much code must be written by the programmer himself and how much is done automatically for him: this can also be tested with the given benchmark problems
  • what's the level of abstraction and how much overhead is involved when translated back to machine code

So actually, I'm not looking for performance as the only and best parameter (which would indeed send me to the hardware and the compilers instead of the language itself), I'm actually looking from a programmers point of view to check what language is best suited for what kind of problems, what it's weaknesses and strengths are and so on...

Bare in mind that this is just a small project and the tests are therefore to be kept small as well. (rigorous testing of everything is therefore not feasible)

Sven
If it's a small project, seriously, why bother? Just pick a language that provides good support for parallelisation.
Mitch Wheat
Because that's the basic idea of the project. Its just a small research project, nothing more is expected afterwards (ie using the language). Besides it's not about picking the *best* language, it's about showing that each language has it's place. Each language has it's strong and weak points...
Sven
+1  A: 

Well, there are a few classics, but different tests emphasize different features. Some distributed systems may be more robust, have more efficient message-passing, etc. Higher message overhead can hurt scalability, since it the normal way to scale up to more machines is to send a larger number of small messages. Some classic problems you can try are a distributed Sieve of Eratosthenes or a poorly implemented fibonacci sequence calculator (i.e. to calculate the 8th number in the series, spin of a machine for the 7th, and another for the 6th). Pretty much any divide-and-conquer algorithm can be done concurrently. You could also do a concurrent implementation of Conway's game of life or heat transfer. Note that all of these algorithms have different focuses and thus you probably will not get one distributed system doing the best in all of them.

I'd say the easiest one to implement quickly is the poorly implemented fibonacci calculator, though it places too much emphasis on creating threads and too little on communication between those threads.

Brian
Though easy to implement, I'm not sure that a "poorly implemented fibonacci calculator" is a particulary good test; it's a bit too contrived.
Mitch Wheat
True, it's contrived. But it was still a somewhat popular demo algorithm when I was researching distributed computing. If you want a distributed algorithm that emphasizes messages and places no emphasis whatsoever on anything else, there's always bittorrent.
Brian
A: 

I have decided to use the Mandelbrot set (the escape time algorithm to be more precise) to benchmark the different languages.
It fits me quite well as the original algorithm can easily be implemented and creating the multi threaded variant from it is not that much work.

below is the code I currently have. It is still a single threaded variant, but I'll update it as soon as I'm satisfied with the result.

#include <cstdlib> //for atoi
#include <iostream>
#include <iomanip> //for setw and setfill
#include <vector>


int DoThread(const double x, const double y, int maxiter) {
    double curX,curY,xSquare,ySquare;
    int i;

    curX = x + x*x - y*y;
    curY = y + x*y + x*y;
    ySquare = curY*curY;
    xSquare = curX*curX;

    for (i=0; i<maxiter && ySquare + xSquare < 4;i++) {
      ySquare = curY*curY;
      xSquare = curX*curX;
      curY = y + curX*curY + curX*curY;
      curX = x - ySquare + xSquare;
    }
    return i;
}

void SingleThreaded(int horizPixels, int vertPixels, int maxiter, std::vector<std::vector<int> >&  result) {
    for(int x = horizPixels; x > 0; x--) {
     for(int y = vertPixels; y > 0; y--) {
            //3.0 -> so we always have -1.5 -> 1.5 as the window; (x - (horizPixels / 2) will go from -horizPixels/2 to +horizPixels/2
      result[x-1][y-1] = DoThread((3.0 / horizPixels) * (x - (horizPixels / 2)),(3.0 / vertPixels) * (y - (vertPixels / 2)),maxiter);
     }
    }
}

int main(int argc, char* argv[]) {
    //first arg = length along horizontal axis
    int horizPixels = atoi(argv[1]);

    //second arg = length along vertical axis
    int vertPixels = atoi(argv[2]);

    //third arg = iterations
    int maxiter = atoi(argv[3]);

    //fourth arg = threads
    int threadCount = atoi(argv[4]);

    std::vector<std::vector<int> > result(horizPixels, std::vector<int>(vertPixels,0)); //create and init 2-dimensional vector
    SingleThreaded(horizPixels, vertPixels, maxiter, result);

    //TODO: remove these lines
    for(int y = 0; y < vertPixels; y++) {
      for(int x = 0; x < horizPixels; x++) {
      std::cout << std::setw(2) << std::setfill('0') << std::hex << result[x][y] << " ";
     }
     std::cout << std::endl;
    }
}

I've tested it with gcc under Linux, but I'm sure it works under other compilers/Operating Systems as well. To get it to work you have to enter some command line arguments like so:

mandelbrot 106 500 255 1

the first argument is the width (x-axis)
the second argument is the height (y-axis)
the third argument is the number of maximum iterations (the number of colors)
the last ons is the number of threads (but that one is currently not used)

on my resolution, the above example gives me a nice ASCII-art representation of a Mandelbrot set. But try it for yourself with different arguments (the first one will be the most important one, as that will be the width)

Sven
A: 

Below you can find the code I hacked together to test the multi threaded performance of pthreads. I haven't cleaned it up and no optimizations have been made; so the code is a bit raw.

the code to save the calculated mandelbrot set as a bitmap is not mine, you can find it here

#include <cstdlib> //for atoi
#include <iostream>
#include <iomanip> //for setw and setfill
#include <vector>

#include "bitmap_Image.h" //for saving the mandelbrot as a bmp

#include <pthread.h>

pthread_mutex_t mutexCounter;
int sharedCounter(0);
int percent(0);

int horizPixels(0);
int vertPixels(0);
int maxiter(0);

//doesn't need to be locked
std::vector<std::vector<int> > result; //create 2 dimensional vector

void *DoThread(void *null) {
    double curX,curY,xSquare,ySquare,x,y;
    int i, intx, inty, counter;
    counter = 0;

    do {
        counter++;
        pthread_mutex_lock (&mutexCounter); //lock
            intx = int((sharedCounter / vertPixels) + 0.5);
            inty = sharedCounter % vertPixels;
            sharedCounter++;
        pthread_mutex_unlock (&mutexCounter); //unlock

        //exit thread when finished
        if (intx >= horizPixels) {
            std::cout << "exited thread - I did " << counter << " calculations" << std::endl;
            pthread_exit((void*) 0);
        }

        //set x and y to the correct value now -> in the range like singlethread
        x = (3.0 / horizPixels) * (intx - (horizPixels / 1.5));
        y = (3.0 / vertPixels) * (inty - (vertPixels / 2));

        curX = x + x*x - y*y;
        curY = y + x*y + x*y;
        ySquare = curY*curY;
        xSquare = curX*curX;

        for (i=0; i<maxiter && ySquare + xSquare < 4;i++){
          ySquare = curY*curY;
          xSquare = curX*curX;
          curY = y + curX*curY + curX*curY;
          curX = x - ySquare + xSquare;
        }
        result[intx][inty] = i;
     } while (true);
}

int DoSingleThread(const double x, const double y) {
    double curX,curY,xSquare,ySquare;
    int i;

    curX = x + x*x - y*y;
    curY = y + x*y + x*y;
    ySquare = curY*curY;
    xSquare = curX*curX;

    for (i=0; i<maxiter && ySquare + xSquare < 4;i++){
      ySquare = curY*curY;
      xSquare = curX*curX;
      curY = y + curX*curY + curX*curY;
      curX = x - ySquare + xSquare;
    }
    return i;

}

void SingleThreaded(std::vector<std::vector<int> >&  result) {
    for(int x = horizPixels - 1; x != -1; x--) {
     for(int y = vertPixels - 1; y != -1; y--) {
            //3.0 -> so we always have -1.5 -> 1.5 as the window; (x - (horizPixels / 2) will go from -horizPixels/2 to +horizPixels/2
      result[x][y] = DoSingleThread((3.0 / horizPixels) * (x - (horizPixels / 1.5)),(3.0 / vertPixels) * (y - (vertPixels / 2)));
     }
    }
}

void MultiThreaded(int threadCount, std::vector<std::vector<int> >&  result) {
    /* Initialize and set thread detached attribute */
    pthread_t thread[threadCount];
    pthread_attr_t attr;
    pthread_attr_init(&attr);
    pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);


    for (int i = 0; i < threadCount - 1; i++) {
        pthread_create(&thread[i], &attr, DoThread, NULL);
    }
    std::cout << "all threads created" << std::endl;

    for(int i = 0; i < threadCount - 1; i++) {
        pthread_join(thread[i], NULL);
    }
    std::cout << "all threads joined" << std::endl;
}

int main(int argc, char* argv[]) {
    //first arg = length along horizontal axis
    horizPixels = atoi(argv[1]);

    //second arg = length along vertical axis
    vertPixels = atoi(argv[2]);

    //third arg = iterations
    maxiter = atoi(argv[3]);

    //fourth arg = threads
    int threadCount = atoi(argv[4]);

    result = std::vector<std::vector<int> >(horizPixels, std::vector<int>(vertPixels,21)); // init 2-dimensional vector
    if (threadCount <= 1) {
        SingleThreaded(result);
    } else {
        MultiThreaded(threadCount, result);
    }


    //TODO: remove these lines
    bitmapImage image(horizPixels, vertPixels);
    for(int y = 0; y < vertPixels; y++) {
      for(int x = 0; x < horizPixels; x++) {
            image.setPixelRGB(x,y,16777216*result[x][y]/maxiter % 256, 65536*result[x][y]/maxiter % 256, 256*result[x][y]/maxiter % 256);
      //std::cout << std::setw(2) << std::setfill('0') << std::hex << result[x][y] << " ";
     }
     std::cout << std::endl;
    }

    image.saveToBitmapFile("~/Desktop/test.bmp",32);
}

good results can be obtained using the program with the following arguments:

mandelbrot 5120 3840 256 3

that way you will get an image that is 5 * 1024 wide; 5 * 768 high with 256 colors (alas you will only get 1 or 2) and 3 threads (1 main thread that doesn't do any work except creating the worker threads, and 2 worker threads)

Sven
A: 

Since the benchmarks game moved to a quad-core machine September 2008, many programs in different programming languages have been re-written to exploit quad-core - for example, the first 10 mandelbrot programs.

igouy