views:

949

answers:

8

I'm a beginner C++ programmer and so I've learnt using arrays rather than vectors (this seems to be the general way to do things, then move on to vectors later on).

I've noticed that a lot of answers on SO suggest using vectors over arrays, and strings over char arrays. It seems that this is the "proper" way to code in C++.

That all being said, when is it still worth using a classic array/char* (if ever)?

+12  A: 

When writing code that should used in other projects, in particular if you target special platforms (embedded, game consoles, etc.) where STL might not be present.

Old projects or projects with special requirements might not want to introduce dependencies on STL libraries. An interface depending on arrays, char* or whatever will be compatible with anything since it's part of the language. STL however is not guaranteed to be present in all build environments.

Laserallan
+5  A: 

I suggest to use arrays whenever you know the size at the compile time itself. Although vector can be used, we have to remember that vector comes with its overhead related to the memory allocation done on heap. If the size is not known, then ofcourse use vectors.

Naveen
You could also use boost::array or tr1::array.
Anonymous
How is the copying overhead any different than an array? The only situation I can think of where arrays would be more efficient is if you had "Foo arr[] = {Foo(...),Foo(...),...};"
Mr Fooz
@Mr Fooz: Yes, your are right..there is no difference. I edited my answer.
Naveen
Also remember that arrays are dangerous for data types that have a nontrivial destructor. Never use arrays of them unless you really, really have to.
David Thornley
+4  A: 

Array/char* is useful when compatibility or performance is of very high priority. Vectors and Strings are higher-level objects that are better when code maintainability, readability and overal easiness counts. Almost always, that is.

Joonas Pulakka
+1  A: 

The only reason I could think of would be speed. You can do better optimizations on array/pointer types than on the according objects. But I would even use the STL if I absoluteltely knew the amount of data my data structures need to hold. Changing from STL to primitive types in an optimization step is better than starting the project with harder-to-read code.

soulmerge
Good point. From what I've seen intels c++ compiler tend to output way more LOOP WAS VECTORIZED in our C code than our c++ code (where the latter uses lots of STL)
Laserallan
std::vector is guaranteed to use a flat array for storage. You can always do all the C-style pointer arithmetic on them.
Mr Fooz
+1  A: 

I see two reasons:

  1. Compatibility (old code without STL).
  2. Speed. (I compared the speed of using vector/binary_search & array/handwritten binary search. For the last one ugly code was obtained (with reallocation of memory), but it was about 1.2-1.5 times faster then first one, I used MS VC++ 8)
sergdev
+8  A: 

Never.

If a raw array seems a better solution than a vector (for reasons other said here) then I use std::tr1::array (or boost::array). It simply does the checks I would do anyway to be sure and the size value usage makes DRY automatically implemented (for example I use the size in loops so that future changes of the array declaration will automatically work).

It's the array implementation "is" a raw array with checks and provided size constant anyway, so it's easy to get the array code in embedded code too because the code is not really "too smart" for any compiler. As far as the compiler support templates, I would copy the boost headers in my code to allow me to use this one instead of raw arrays. Because it's clearly too easy to make mistakes with raw arrays. Raw arrays are evil. They are error prone.

And it work very well with STL algorithms (if available).

Now, there is two cases where you need to use raw arrays (obligation) : when you're in C-only code (not communicating with C code, but writing in C-only part of code, like a C library). But then it's another language.

The other reason is when the compiler don't support templates at all...

Klaim
I quite agree. Something very like boost::array should have been a no-brainer for the original STL; I've never understood why one wasn't included (unless integer template parameters were a later addition to the language ?)
timday
Yeah it seem they tried to fix that with TR1 but compilers didn't got it fast enough to make it popular. :/Anyway today there is no reason to use raw arrays when std::tr1::array or boost::array is compilable. C++0x will help later to make it "the standard way" I guess.
Klaim
+4  A: 

This question could actually be separated into two parts:

  1. How should I manage memory for flat array data?
  2. How should I access elements of a flat array?

I personally prefer to use std::vector for managing memory except in cases where I need to maintain compatibility with code that doesn't use STL (i.e. when interfacing with straight C code). It's much harder to make exception-safe code with raw arrays allocated via new or malloc (in part because it's really easy to forget that you need to worry about it). See any article on RAII for the reasons.

In practice, std::vector is implemented as a flat array. As such, it's always possible to pull out the raw array and use C-style access patterns. I typically start with the vector subscript operator syntax. For some compilers, when producing a debug version, vectors provide automatic boundary checking. This is slow (often a 10x slowdown for tight loops), but helpful in finding certain types of bugs.

If profiling on a particular platform indicates that the operator[] is a bottleneck, then I switch to directly accessing the raw array. Interestingly, depending on the compiler and OS, it can sometimes be faster to use an STL vector than a raw array.

Here are some results from a simple test application. It was compiled with Visual Studio 2008 in 32-bit Release mode using /O2 optimizations and run on Vista x64. Similar results are achieved with a 64-bit test application.

Binary search...
           fill vector (for reference) :  0.27 s
                   array with ptr math :  0.38 s <-- C-style pointers lose
                  array with int index :  0.23 s <-- [] on raw array wins
            array with ptrdiff_t index :  0.24 s
                 vector with int index :  0.30 s  <-- small penalty for vector abstraction
           vector with ptrdiff_t index :  0.30 s

Counting memory (de)allocation...
                memset (for reference) :  2.85 s
      fill malloc-ed raw array with [] :  2.66 s
     fill malloc-ed raw array with ptr :  2.81 s
         fill new-ed raw array with [] :  2.64 s
        fill new-ed raw array with ptr :  2.65 s
                  fill vector as array :  3.06 s  \ something's slower 
                           fill vector :  3.05 s  / with vector!

NOT counting memory (de)allocation...
                memset (for reference) :  2.57 s
      fill malloc-ed raw array with [] :  2.86 s
     fill malloc-ed raw array with ptr :  2.60 s
         fill new-ed raw array with [] :  2.63 s
        fill new-ed raw array with ptr :  2.78 s
                  fill vector as array :  2.49 s \ after discounting the  
                           fill vector :  2.54 s / (de)allocation vector is faster!

Code:

#define WINDOWS_LEAN_AND_MEAN
#include <windows.h>
#include <string>
#include <vector>
#include <stdio.h>

using namespace std;

__int64 freq; // initialized in main
int const N = 1024*1024*1024/sizeof(int)/2; // 1/2 GB of data
int const nIter = 10;

class Timer {
public:
  Timer(char *name) : name(name) {
    QueryPerformanceCounter((LARGE_INTEGER*)&start);
  }
  ~Timer() {
    __int64 stop;
    QueryPerformanceCounter((LARGE_INTEGER*)&stop);
    printf("  %36s : % 4.2f s\n", name.c_str(), (stop - start)/double(freq));
  }
private:
  string const name;
  __int64 start;
};


template <typename Container, typename Index>
int binarySearch_indexed(Container sortedArray, Index first, Index last, int key) {
  while (first <= last) {
    Index mid = (first + last) / 2; // NOT safe if (first+last) is too big!
    if (key > sortedArray[mid])      first = mid + 1;
    else if (key < sortedArray[mid])  last = mid - 1; 
    else return mid;  
  }
  return 0; // Use "(Index)-1" in real code
}

int Dummy = -1;
int const *binarySearch_ptr(int const *first, int const *last, int key) {
  while (first <= last) {
    int const *mid = (int const *)(((unsigned __int64)first + (unsigned __int64)last) / 2);  
    if (key > *mid)      first = mid + 1;
    else if (key < *mid)  last = mid - 1; 
    else return mid;  
  }
  return &Dummy; // no NULL checks: don't do this for real
}

void timeFillWithAlloc() {
  printf("Counting memory (de)allocation...\n");
  { 
    Timer tt("memset (for reference)");
    int *data = (int*)malloc(N*sizeof(int));
    for (int it=0; it<nIter; it++) memset(data, 0, N*sizeof(int));
    free(data);
  }
  { 
    Timer tt("fill malloc-ed raw array with []");
    int *data = (int*)malloc(N*sizeof(int));
    for (int it=0; it<nIter; it++) for (size_t i=0; i<N; i++) data[i] = (int)i;
    free(data);
  }
  { 
    Timer tt("fill malloc-ed raw array with ptr");
    int *data = (int*)malloc(N*sizeof(int));
    for (int it=0; it<nIter; it++) {
    int *d = data;
    for (size_t i=0; i<N; i++) *d++ = (int)i;
    }
    free(data);
  }
  { 
    Timer tt("fill new-ed raw array with []");
    int *data = new int[N];
    for (int it=0; it<nIter; it++) for (size_t i=0; i<N; i++) data[i] = (int)i;
    delete [] data;
  }
  { 
    Timer tt("fill new-ed raw array with ptr");
    int *data = new int[N];
    for (int it=0; it<nIter; it++) {
    int *d = data;
    for (size_t i=0; i<N; i++) *d++ = (int)i;
    }
    delete [] data;
  }
  { 
    Timer tt("fill vector as array");
    vector<int> data(N); 
    for (int it=0; it<nIter; it++) {
      int *d = &data[0]; 
    for (size_t i=0; i<N; i++) *d++ = (int)i;
    }
  }
  { 
    Timer tt("fill vector");
    vector<int> data(N); 
    for (int it=0; it<nIter; it++) for (size_t i=0; i<N; i++) data[i] = (int)i;
  }
  printf("\n");
}

void timeFillNoAlloc() {
  printf("NOT counting memory (de)allocation...\n");

  { 
    int *data = (int*)malloc(N*sizeof(int));
    {
      Timer tt("memset (for reference)");
      for (int it=0; it<nIter; it++) memset(data, 0, N*sizeof(int));
    }
    free(data);
  }
  { 
    int *data = (int*)malloc(N*sizeof(int));
    {
      Timer tt("fill malloc-ed raw array with []");
      for (int it=0; it<nIter; it++) for (size_t i=0; i<N; i++) data[i] = (int)i;
    }
    free(data);
  }
  { 
    int *data = (int*)malloc(N*sizeof(int));
    {
      Timer tt("fill malloc-ed raw array with ptr");
      for (int it=0; it<nIter; it++) {
        int *d = data;
        for (size_t i=0; i<N; i++) *d++ = (int)i;
      }
    }
    free(data);
  }
  { 
    int *data = new int[N];
    {
      Timer tt("fill new-ed raw array with []");
      for (int it=0; it<nIter; it++) for (size_t i=0; i<N; i++) data[i] = (int)i;
    }
    delete [] data;
  }
  { 
    int *data = new int[N];
    {
      Timer tt("fill new-ed raw array with ptr");
      for (int it=0; it<nIter; it++) {
        int *d = data;
        for (size_t i=0; i<N; i++) *d++ = (int)i;
      }
    }
    delete [] data;
  }
  { 
    vector<int> data(N); 
    {
      Timer tt("fill vector as array");
      for (int it=0; it<nIter; it++) {
        int *d = &data[0]; 
        for (size_t i=0; i<N; i++) *d++ = (int)i;
      }
    }
  }
  { 
    vector<int> data(N); 
    {
      Timer tt("fill vector");
      for (int it=0; it<nIter; it++) for (size_t i=0; i<N; i++) data[i] = (int)i;
    }
  }
  printf("\n");
}

void timeBinarySearch() {
  printf("Binary search...\n");
  vector<int> data(N); 
  {
    Timer tt("fill vector (for reference)");
    for (size_t i=0; i<N; i++) data[i] = (int)i;
  }

  {
    Timer tt("array with ptr math");
    int sum = 0;
    for (int i=-1000000; i<1000000; i++) {
      sum += *binarySearch_ptr(&data[0], &data[0]+data.size(), i);
    }
  }
  {
    Timer tt("array with int index");
    int sum = 0;
    for (int i=-1000000; i<1000000; i++) {
      sum += data[binarySearch_indexed<int const *, int>(
        &data[0], 0, (int)data.size(), -1)];
    }
  }
  {
    Timer tt("array with ptrdiff_t index");
    int sum = 0;
    for (int i=-1000000; i<1000000; i++) {
      sum += data[binarySearch_indexed<int const *, ptrdiff_t>(
        &data[0], 0, (ptrdiff_t)data.size(), -1)];
    }
  }
  {
    Timer tt("vector with int index");
    int sum = 0;
    for (int i=-1000000; i<1000000; i++) {
      sum += data[binarySearch_indexed<vector<int> const &, int>(
        data, 0, (int)data.size(), -1)];
    }
  }
  {
    Timer tt("vector with ptrdiff_t index");
    int sum = 0;
    for (int i=-1000000; i<1000000; i++) {
      sum += data[binarySearch_indexed<vector<int> const &, ptrdiff_t>(
        data, 0, (ptrdiff_t)data.size(), -1)];
    }
  }

  printf("\n");
}

int main(int argc, char **argv)
{
  QueryPerformanceFrequency((LARGE_INTEGER*)&freq);

  timeBinarySearch();
  timeFillWithAlloc();
  timeFillNoAlloc();

  return 0;
}
Mr Fooz
std::vector is not guaranteed to be a flat array by the standard, which surprised a lot of C++ Standard Committee members when they realized it. I don't know of any implementations that don't contain a flat array, and the upcoming C++ standard will guarantee that.
David Thornley
@David: Interesting. The post has been updated.
Mr Fooz
A: 

I work on a shared library that needs access to structured data. This data is known at compile time, so it uses file-scoped constant arrays of POD (plain old data) structures to hold the data.

This causes the compiler and linker to put most of the data in a read-only section, with two benefits:

  • It can be mapped into memory directory from disk without running any special initialization code.
  • It can be shared between processes.

The only exception is that the compiler still generates initialization code to load floating point constants, so any structure containing floating point numbers ends up in a writable section. I suspect that this has something do with floating exceptions or floating point rounding modes, but I'm not sure how to verify either hypothesis.

If I used vector and string objects for this, then the compiler would generate a lot more initialization code that would execute whenever my shared library is loaded. The constant data would be allocated on the heap, so it would not be shareable between processes.

If I read the data in from a file on disk, I would have to deal with checking the format of the data, instead of having the C++ compiler do it for me. I also would have to manage the lifetime of the data in a codebase that has had this global data "baked into" it since the beginning.

bk1e