views:

3380

answers:

8

I want to create an adjacency matrix for a graph. Since I read it is not safe to use arrays of the form matrix[x][y] because they don't check for range, I decided to use the vector template class of the stl. All I need to store in the matrix are boolean values. So my question is, if using std::vector<std::vector<bool>* >* produces too much overhead or if there is a more simple way for a matrix and how I can properly initialize it.

EDIT: Thanks a lot for the quick answers. I just realized, that of course I don't need any pointers. The size of the matrix will be initialized right in the beginning and won't change until the end of the program. It is for a school project, so it would be good if I write "nice" code, although technically performance isn't too important. Using the STL is fine. Using something like boost, is probably not appreciated.

A: 

Consider also how big is your graph/matrix, does performance matter a lot? Is the graph static, or can it grow over time, e.g. by adding new edges?

siddhadev
+6  A: 

Note that also you can use boost.ublas for matrix creation and manipulation and also boost.graph to represent and manipulate graphs in a number of ways, as well as using algorithms on them, etc.

Edit: Anyway, doing a range-check version of a vector for your purposes is not a hard thing:

template <typename T>
class BoundsMatrix
{
        std::vector<T> inner_;
        unsigned int dimx_, dimy_;

public:
        BoundsMatrix (unsigned int dimx, unsigned int dimy)
                : dimx_ (dimx), dimy_ (dimy)
        {
                inner_.resize (dimx_*dimy_);
        }

        T& operator()(unsigned int x, unsigned int y)
        {
                if (x >= dimx_ || y>= dimy_)
                        throw 0; // ouch
                return inner_[dimx_*y + x];
        }
};

Note that you would also need to add the const version of the operators, and/or iterators, and the strange use of exceptions, but you get the idea.

Diego Sevilla
+1  A: 

Mind you std::vector doesn't do range checking as well.

shoosh
unless you use a debugging version of stdlib, or you're calling vector::at
+2  A: 

What I would do is create my own class for dealing with matrices (probably as an array[x*y] because I'm more used to C (and I'd have my own bounds checking), but you could use vectors or any other sub-structure in that class).

Get your stuff functional first then worry about how fast it runs. If you design the class properly, you can pull out your array[x*y] implementation and replace it with vectors or bitmasks or whatever you want without changing the rest of the code.

I'm not totally sure, but I thing that's what classes were meant for, the ability to abstract the implementation well out of sight and provide only the interface :-)

paxdiablo
+1  A: 

Best way:

Make your own matrix class, that way you control every last aspect of it, including range checking.

eg. If you like the "[x][y]" notation, do this:

class my_matrix {
  std::vector<std::vector<bool> >m;
public:
  my_matrix(unsigned int x, unsigned int y) {
    m.resize(x, std::vector<bool>(y,false));
  }
  class matrix_row {
    std::vector<bool>& row;
  public:
    matrix_row(std::vector<bool>& r) : row(r) {
    }
    bool& operator[](unsigned int y) {
      return row.at(y);
    }
  };
  matrix_row& operator[](unsigned int x) {
    return matrix_row(m.at(x));
  }
};
// Example usage
my_matrix mm(100,100);
mm[10][10] = true;

nb. If you program like this then C++ is just as safe as all those other "safe" languages.

Jimmy J
+1  A: 

In addition to all the answers that have been posted so far, you might do well to check out the C++ FAQ Lite. Questions 13.10 - 13.12 and 16.16 - 16.19 cover several topics related to rolling your own matrix class. You'll see a couple of different ways to store the data and suggestions on how to best write the subscript operators.

Also, if your graph is sufficiently sparse, you may not need a matrix at all. You could use std::multimap to map each vertex to those it connects.

Kristo
+3  A: 

The standard vector does NOT do range checking by default.

i.e. The operator[] does not do a range check.

The method at() is similar to [] but does do a range check.
It will throw an exception on out of range.

std::vector::at()
std::vecyor::operator[]()

Other notes: Why a vector<Pointers> ?
You can quite easily have a vector<Object>. Now there is no need to worry about memory management (i.e. leaks).

std::vector<std::vector<bool> >   m;

Note: vector<bool> is overloaded and not very efficient (i.e. this structure was optimized for size not speed) (It is something that is now recognized as probably a mistake by the standards committee).

If you know the size of the matrix at compile time you could use std::bitset?

std::vector<std::bitset<5> >    m;

or if it is runtime defined use boost::dynamic_bitset

std::vector<boost::dynamic_bitset>  m;

All of the above will allow you to do:

m[6][3] = true;
Martin York
+1  A: 

If you want 'C' array performance, but with added safety and STL-like semantics (iterators, begin() & end() etc), use boost::array.

Basically it's a templated wrapper for 'C'-arrays with some NDEBUG-disable-able range checking asserts (and also some std::range_error exception-throwing accessors).

I use stuff like

boost::array<boost::array<float,4>,4> m;

instead of

float m[4][4];

all the time and it works great (with appropriate typedefs to keep the verbosity down, anyway).


UPDATE: Following some discussion in the comments here of the relative performance of boost::array vs boost::multi_array, I'd point out that this code, compiled with g++ -O3 -DNDEBUG on Debian/Lenny amd64 on a Q9450 with 1333MHz DDR3 RAM takes 3.3s for boost::multi_array vs 0.6s for boost::array.

#include <iostream>
#include <time.h>
#include "boost/array.hpp"
#include "boost/multi_array.hpp"

using namespace boost;

enum {N=1024};

typedef multi_array<char,3> M;
typedef array<array<array<char,N>,N>,N> C;

// Forward declare to avoid being optimised away
static void clear(M& m);
static void clear(C& c);

int main(int,char**)
{
  const clock_t t0=clock();

  {
    M m(extents[N][N][N]);
    clear(m);
  }

  const clock_t t1=clock();

  {
    std::auto_ptr<C> c(new C);
    clear(*c);
  }

  const clock_t t2=clock();

  std::cout 
    << "multi_array: " << (t1-t0)/static_cast<float>(CLOCKS_PER_SEC) << "s\n"
    << "array      : " << (t2-t1)/static_cast<float>(CLOCKS_PER_SEC) << "s\n";

  return 0;
}

void clear(M& m)
{
  for (M::index i=0;i<N;i++)
    for (M::index j=0;j<N;j++)
      for (M::index k=0;k<N;k++)
    m[i][j][k]=1;
}


void clear(C& c)
{
  for (int i=0;i<N;i++)
    for (int j=0;j<N;j++)
      for (int k=0;k<N;k++)
    c[i][j][k]=1;
}
timday
For multidimensional arrays, boost::multiarray ( http://www.boost.org/doc/libs/1_38_0/libs/multi_array/doc/index.html ) is probably a better choice.
janneb
No way; I've tried that in the past and it's performance was horrific compared with the C-array equivalent. And its storage is always heap allocated c.f boost::array which can use the stack.
timday
Fascinating; in my benchmarks there is essentially no performance difference between a static C style array vs. boost::multiarray. Then again, I tested element access of large arrays so heap allocation performance is not a problem.
janneb
See update for basis of my belief it's slow. I'm using g++ 4.3.2, boost 1.35.
timday
Thanks for the benchmarks, I've been looking at it to see why the results are different from my tests. There are two things that make up for the major differences in timing. 1) Use of char instead of double 2) Including the initialization in the timing. For example, if you use (CONTINUED...)
janneb
double or float, and don't include initialization, then multi_array is slightly faster. Some experimenting leads me to believe that multi_array initialization zeroes the array; it is responsible for half the time (testing with different sized arrays). However, for char and int, array (CONTINUED...)
janneb
consistently outperforms multi_array. This is with g++ 4.1.2, boost 1.33.
janneb