views:

1321

answers:

7

I am writing some C++ code in Linux where I have declared a few 2D arrays like so:

 double x[5000][500], y[5000][500], z[5000][500];

During compilation there is no error. When I execute it says "segmentation fault".

Wen I reduce the size of the array from 5000 to 50, the program runs fine. How can I protect myself against this problem?

+2  A: 

Your declaration should appear at top level, outside any procedure or method.

By far the easiest way to diagnose a segfault in C or C++ code is to use valgrind. If one of your arrays is at fault, valgrind will pinpoint exactly where and how. If the fault lies elsewhere, it will tell you that, too.

valgrind can be used on any x86 binary but will give more information if you compile with gcc -g.

Norman Ramsey
Putting it at top level will cause it to use the heap. Another way to do this is to use new or malloc (remember to free memory appropriately when done).
Matthew Flaschen
Toplevel makes it global (uninitialized data, never recovered). Usually when people talk about the heap they mean dynamically allocated. Problem with malloc is that you lose the 2D array notation...
Norman Ramsey
@Norman Ramsey Agreed, being able to write x[i][j] instead of x[i*XDIM]+j is a big advantage for most non-obsessive coders. I suspect the OP is a physicist.
Thomas L Holaday
+12  A: 

These arrays are on the stack. Stacks are quite limited in size. You probably run into a ... stack overflow :)

If you want to avoid this, you need to put them on the free store:

double* x =new double[5000*5000];

But you better start the good habit of using the standard containers, which wrap all this for you:

std::vector< std::vector<int> > x( std::vector<int>(500), 5000 );

Plus: even if the stack fits the arrays, you still need room for functions to put their frames on it.

xtofl
That should probably read: 'double * x = new double[ 5000*500 ]' with a single pointer *. (Or else a more cumbersome real 2D dynamically allocated array...)
David Rodríguez - dribeas
One reservation about always using vector: as far as I understand it, if you walk off the end of the array it just allocates a larger array and copies everything over which might create subtle and hard to find errors when you are really tying to work with a fixed size array. At least with a real array you'll segfault if you walk off the end making the error easier to catch.
Robert S. Barnes
@dribeas: thanks. Indeed, multidim arrays aren't my favorites:)
xtofl
@robert: using push_back will cause the vector to grow where needed. If you use it through iterators or direct access, though, no growth is present.
xtofl
@Robert S. Barnes If you write code that "walk off the end of the array" it is seriously broken no matter if you use std::vector or not.
lothar
@Robert: xtofl is correct. walking/iterating the array won't change it's size.
SilverSun
+1  A: 

Looks to me like you have an honest-to-Spolsky stack overflow!

Try compiling your program with gcc's -fstack-check option. If your arrays are too big to allocate on the stack, you'll get a StorageError exception.

I think it's a good bet, though, as 5000*500*3 doubles (8 bytes each) comes to around 60 megs - no platform has enough stack for that. You'll have to allocate your big arrays on the heap.

Charlie Tangora
When i use -fstack_check option in both gcc and g++ it tells cc1plus: error: unrecognized command line option "-fstack_check"
Try -fstack-check instead of -fstack_check
Anthony Cramp
double m_x[500000][500],m_y[500000][500],m_z[500000][500];When i execute with only statment (above) in main() it is executing.
@kar What is your optimization level? I believe gcc strips unused variables at anything above -O0. A function containing only a plain-old-data declaration should compile to only a return - which means that it uses a massive stack only in theory, not in practice.(Sorry about the underscore instead of dash. Fixed in the answer.)
Charlie Tangora
+13  A: 

If your program looks like this ...

int main(int, char **) {
   double x[5000][500],y[5000][500],z[5000][500];
   // ...
   return 0;
}

... then you are overflowing the stack. The fastest way to fix this is to add the word static.

int main(int, char **) {
   static double x[5000][500],y[5000][500],z[5000][500];
   // ...
   return 0;
}

The second fastest way to fix this is to move the declaration out of the function:

double x[5000][500],y[5000][500],z[5000][500];
int main(int, char **) {
   // ...
   return 0;
}

The third fastest way to fix this is to allocate the memory on the heap:

int main(int, char **) {
   double **x = new double*[5000];
   double **y = new double*[5000];
   double **z = new double*[5000];
   for (size_t i = 0; i < 5000; i++) {
      x[i] = new double[500];
      y[i] = new double[500];
      z[i] = new double[500];
   }
   // ...
   for (size_t i = 5000; i > 0; ) {
      delete[] z[--i];
      delete[] y[i];
      delete[] x[i];
   }
   delete[] z;
   delete[] y;
   delete[] x;

   return 0;
}

The fourth fastest way is to allocate them on the heap using std::vector. It is fewer lines in your file but more lines in the compilation unit, and you must either think of a meaningful name for your derived vector types or tuck them into an anonymous namespace so they won't pollute the global namespace:

#include <vector>
using std::vector
namespace { 
  struct Y : public vector<double> { Y() : vector<double>(500) {} };
  struct XY : public vector<Y> { XY() : vector<Y>(5000) {} } ;
}
int main(int, char **) {
  XY x, y, z;
  // ...
  return 0;
}

The fifth fastest way is to allocate them on the heap, but use templates so the dimensions are not so remote from the objects:

include <vector>
using namespace std;
namespace {
  template <size_t N>
  struct Y : public vector<double> { Y() : vector<double>(N) {} };
  template <size_t N1, size_t N2>
  struct XY : public vector< Y<N2> > { XY() : vector< Y<N2> > (N1) {} } ;
}
int main(int, char **) {
  XY<5000,500> x, y, z;
  XY<500,50> mini_x, mini_y, mini_z;
  // ...
  return 0;
}

The most performant way is to allocate the two-dimensional arrays as one-dimensional arrays, and then use index arithmetic.

All the above assumes that you have some reason, a good one or a poor one, for wanting to craft your own multidimensional array mechanism. If you have no reason, and expect to use multidimensional arrays again, strongly consider installing a library:

Thomas L Holaday
Recheck conditions. You allocate 5000 double* but then only allocate 500 of them (the for loop should iterate to 5000)
David Rodríguez - dribeas
Nice one! I didn't think about the static. Might note this makes the function non-reentrant, though!
xtofl
@dribeas thanks for the bug report.
Thomas L Holaday
+4  A: 

You may want to try and use Boost.Multi_array

typedef boost::multi_array<double, 2> Double2d;
Double2d x(boost::extents[5000][500]);
Double2d y(boost::extents[5000][500]);
Double2d z(boost::extents[5000][500]);

The actual large memory chunk will be allocated on the heap and automatically deallocated when necessary.

Benoît
+1  A: 

Another solution to the previous ones would be to execute a

ulimit -s stack_area

to expand the maximum stack.

Tom
How to use ulimit -s stack_area
You should execute it from the shell. More on man ulimit.
Tom
+1  A: 

One reservation about always using vector: as far as I understand it, if you walk off the end of the array it just allocates a larger array and copies everything over which might create subtle and hard to find errors when you are really tying to work with a fixed size array. At least with a real array you'll segfault if you walk off the end making the error easier to catch.

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv) {

typedef double (*array5k_t)[5000];

array5k_t array5k = calloc(5000, sizeof(double)*5000);

// should generate segfault error
array5k[5000][5001] = 10;

return 0;
}
Robert S. Barnes
Good point about the ever-growing vector. That's only when you use push_back, though. When you index out of it, or iterate over the edge, you'll run into a segfault just the same.There's no guarantee to segfault, though: only when using no-mans-land markers around the edges.
xtofl
why not "array5k_t array5k = new double[5000][5000];"
newacct
because I wrote the example in C, not C++
Robert S. Barnes
the typedef is quite useful, really, to master multidim arrays.
xtofl