views:

210

answers:

4

I have a large code base, originally C ported to C++ many years ago, that is operating on a number of large arrays of spatial data. These arrays contain structs representing point and triangle entities that represent surface models. I need to refactor the code such that the specific way these entities are stored internally varies for specific scenarios. For example if the points lie on a regular flat grid, I don't need to store the X and Y coordinates, as they can be calculated on the fly, as can the triangles. Similarly, I want to take advantage of out of core tools such as STXXL for storage. The simplest way of doing this is replacing array access with put and get type functions, e.g.

point[i].x = XV;

becomes

Point p = GetPoint(i);
p.x = XV;
PutPoint(i,p);

As you can imagine, this is a very tedious refactor on a large code base, prone to all sorts of errors en route. What I'd like to do is write a class that mimics the array by overloading the [] operator. As the arrays already live on the heap, and move around with reallocs, the code already assumes that references into the array such as

point *p = point + i;

may not be used. Is this class feasible to write? For example writing the methods below in terms of the [] operator;

void MyClass::PutPoint(int Index, Point p)
{
   if (m_StorageStrategy == RegularGrid)
   {
      int xoffs,yoffs;
      ComputeGridFromIndex(Index,xoffs,yoffs);
      StoreGridPoint(xoffs,yoffs,p.z);
    } else
       m_PointArray[Index] = p;   
  }
}

Point MyClass::GetPoint(int Index)
{
   if (m_StorageStrategy == RegularGrid)
   {
      int xoffs,yoffs;
      ComputeGridFromIndex(Index,xoffs,yoffs);
      return GetGridPoint(xoffs,yoffs);   // GetGridPoint returns Point
    } else
       return m_PointArray[Index];   
  }
}

My concern is that all the array classes I've seen tend to pass by reference, whereas I think I'll have to pass structs by value. I think it should work put other than performance, can anyone see any major pitfalls with this approach. n.b. the reason I have to pass by value is to get

point[a].z = point[b].z + point[c].z

to work correctly where the underlying storage type varies.

+5  A: 

You should not need to pass the array by value. For mutating the values in the array, you want two versions of operator[], one which returns a reference (to mutate) and one a const reference.

There is no reason in principle not to use operator[], as long as you do not need to vary the type of the storage at run time - there are no virtual operators, so you would need a named function if you want runtime polymorphism. In that case, you can create a simple struct which adapts the operator calls to function calls (though it rather depends on the storage API - if the code assumes that assigning to the point's member variables changes the stored data, you might have to make the point type a template variable too so this can be overridden).

Looking at your sample code, it has a test for the storage strategy. Do not do this. Either use OO and have your storage object implement a common virtual interface, or (probably better) use template programming to vary the storage mechanism.

If you look at the guarantees made by std::vector (in more recent C++ standards), then it is possible to have something which has dynamic storage and allows use of pointer arithmetic, though that requires contiguous storage. Given that some of your values are created on the fly, it is probably not worth placing that restriction on your implementations, but the constraint itself does not prevent use of operator[].

Pete Kirkham
Returning a reference to the point implies having multiple unique points for all the potentially live references that are active at the same time. This will require a cache of transformed data, but should be workable. Thanks for the answer.
Shane MacLaughlin
WRT use of a storage strategy rather than OO or templates, it wouldn't work as the existing code is not written to support it, and a principal requirement is to minimise the amount of rewrite. I want to drop in something that is run-time polymorphic at the back end, but at the front end mimics a C array. If I was writing it from scratch, templates are undoubtably the way to go.
Shane MacLaughlin
+2  A: 

What you want is possible, but as you need write access as well, the result will be a little bit more complex sometimes. What you want is the setter function returning not a direct "Point write access", rather a temporary copy, which will do the write once the copy goes out of the scope.

Following code fragment tries to outline the solution:

class PointVector
{
  MyClass container_;

  public:
  class PointExSet: public Point
  {
    MyClass &container_;
    int index_;

    public:
    PointExSet(MyClass &container, int index)
      :Point(container.GetVector(index)),container_(container),index_(index)
    {
    }

    ~PointExSet()
    {
      container_.PutVector(index_) = *this;
    }
  };

  PointExSet operator [] (int i)
  {
    return PointExSet(container_,i);
  }
};

It is not as nice as you would probably hope it to be, but I am afraid you cannot get a much better solution in C++.

Suma
Yup, hence the need for references. I think I should be able do it using a cache of the transformed type, with an index the same size as the array used to maintain the cache.
Shane MacLaughlin
+1  A: 

To have a full control over operations on array, operator[] should return a special object (invented long ago and called "cursor") that will handle operations for you. As an example:

class Container
{
  PointCursor operator [] (int i)
  {
    return PointCursor(this,i);
  }
};
class PointCursor
{
public:
    PointCursor(_container, _i)
       : container(_container), i(_i),
         //initialize subcursor
         x(container, i) {}     

    //subcursor
    XCursor x;
private:
   Container* container;
   int i;
};
class XCursor
{
public:
    XCursor(_container, _i)
      : container(_container), i(_i) {}

     XCursor& operator = (const XCursor& xc)
     {
          container[i].x = xc.container[xc.i].x;
          //or do whatever you want over x
     }

     Container* container;
     int i; 
}
//usage
my_container[i].x = their_container[j].x; //calls XCursor::operator = ()
Alsk
A: 

After reading the above answers, I decided that Pete's answer with two versions of operator[] was the best way forward. To handle the morphing between types at run-time I created a new array template class that took four parameters as follows;

template<class TYPE, class ARG_TYPE,class BASE_TYPE, class BASE_ARG_TYPE>
class CMorphArray 
{
int GetSize() { return m_BaseData.GetSize(); }
BOOL IsEmpty() { return m_BaseData.IsEmpty(); }

// Accessing elements
const TYPE& GetAt(int nIndex) const;
TYPE& GetAt(int nIndex);
void SetAt(int nIndex, ARG_TYPE newElement);
const TYPE& ElementAt(int nIndex) const;
TYPE& ElementAt(int nIndex);

// Potentially growing the array
int Add(ARG_TYPE newElement);

// overloaded operator helpers
const TYPE& operator[](int nIndex) const;
TYPE& operator[](int nIndex);

   CBigArray<BASE_TYPE, BASE_ARG_TYPE>  m_BaseData;
private:
   CBigArray<TYPE, ARG_TYPE>    m_RefCache;
   CBigArray<int, int&> m_RefIndex;
   CBigArray<int, int&> m_CacheIndex;

   virtual void Convert(BASE_TYPE,ARG_TYPE) = 0;
   virtual void Convert(TYPE,BASE_ARG_TYPE) = 0;

   void InitCache();
   TYPE&    GetCachedElement(int nIndex);
};

The main data storage is in m_BaseData which is the data in its native format, which can vary in type as discussed. m_RefCache is secondary array to cache of elements in the expected format, and the GetCachedElement function uses the virtual Convert functions to translate the data as it is moved in and out of the cache. The cache needs to be at least as big as the number of simultaneous references that can be active at any one time, but in my case will probably benefit from being bigger as it reduces the number of conversions required. While Alsk's cursor implementation probably would have worked well, the solution given requires fewer object copies and temporary variables, and ought to afford slightly better performance which is important in this case.

Apologies to all you STL fans for the older MFC look and feel; the rest of the project is MFC so it makes more sense in this case. The CBigArray was the result of a related stack overflow question that became the basis of my large array handling. I hope to finish the implementation today and test tomorrow. If it all goes belly up on me, I'll edit this post accoringly.

Shane MacLaughlin