views:

391

answers:

13

First of all I apologize for the long lead up to such a simplistic question.

I am implementing a class which serves as a very long 1 dimensional index on a space filling curve or the n-tuple representing the Cartesian coordinate that index corresponds to.

class curvePoint
{
public:
    friend class curveCalculate;

    //Construction and Destruction
    curvePoint(): point(NULL), dimensions(0) {}
    virtual ~curvePoint(){if(point!=NULL) delete[] point;}

    //Mutators
    void convertToIndex(){ if(isTuple()) calc(this); }
    void convertToTuple(){ if(isIndex()) calc(this); }
    void setTuple(quint16 *tuple, int size);
    void setIndex(quint16 *index, int size);
    void setAlgorithm(curveType alg){algorithm = alg;}

    //Inspectors
    bool isIndex(){return current==Index;}
    bool isTuple(){return current==Tuple;}
    size_t size(){return dimensions;}
    quint16 operator[](size_t index);

    enum curveType{HilbertCurve, ZCurve, GrayCodeCurve};
    enum status{Index, Tuple};

private:
    curveCalculate calc;
    curveType algorithm;
    quint16 *point;
    size_t dimensions;
    status current;
};

(The length of the array pointed to by point is dimensions)

Anyways in the implementation of operator[] I was wondering what the best method to achieve bounds checking is. I want to avoid throwing exceptions if at all possible, and the full range of values is usable for each number in the array so a special value to return in case of an out of bounds error is not possible either;

I was thinking of something like this though implemented in the class definition:

quint16 curvePoint::operator[](size_t index)
{
    return point[ index % dimensions ];
}

This makes it so that we never leave the bounds of the array and if well documented I think it would be fine; nevertheless, I am leary of this particular implementation.

Does this look acceptable to others? Is there any other way of doing bounds checking while still satisfying my constraints?

Edit: Calculation of things like Hilbert curves etc are highly messy, messy enough that I do not not want the additional interface for the stl libraries in the way.

Additionally because I will have to convert many thousands of these every time the multidimensional database is queried I do not want the additional cost of the stl function calls in the mix, if at all possible.

I rather like the idea of the assert; but, if I remember correctly that breaks in release builds does it not?

I suppose I can use exceptions, that seems to be what everyone is rooting for, but I am using the Qt libraries and those avoid exceptions for both performance and portability and I was hoping to do the same.

+3  A: 

If what you need is some sort of "circular" array of points, then your solution is ok. However, to me, it looks just like hiding missusage of indexing operator behind some "safe" logic, so I would be against your proposed solution.

If don't want to allow index overflow, then you could check and throw an exception.

quint16 curvePoint::operator[](size_t index)
{
    if( index >= dimensions)
    {
       throw std::overflow_error();
    }
    return point[ index ];
}

If you want to have less overhead, you could avoid exception, by using debug time assertions (assume that the provided index is always valid):

quint16 curvePoint::operator[](size_t index)
{
    assert( index < dimensions);
    return point[ index ];
}

However, I suggest that, instead of using point and dimension members, use a std::vector< quint16> for point storage. It already has an index based access that you could use:

quint16 curvePoint::operator[](size_t index)
{
    // points is declared as std::vector< quint16> points;
    return points[ index ];
}
Cătălin Pitiș
+6  A: 

For me, this solution is unacceptable because you can be hiding a very hard to find bug. Throwing an out of range exception is the way to go, or at least put an assertion in the function.

TimW
If anything, give your users the choice. Don't put checks in when NDEBUG is defined, or define NO_BOUNDS_CHECKING if NDEBUG is defined, and don't check if NO_BOUNDS_CHECKING is defined. That way users can put "#define NO_BOUNDS_CHECKING" before including to get custom behavior.
GMan
As I said in my question I want to avoid exceptions if it is at all possible. And asserts break when compiling without the debug libraries if I remember correctly.
James Matta
I agree that the solution is unacceptable, but prefer the assert to the check and exception. The check would cause a lot of overhead when iterating a large collection, and will usually be supplied by the client in the form of a for loop. `for(size_t i=0; i<collection.size(); ++i)`
iain
@James, if you want to ignore all the advanced language mechanisms then you'll be left with just if... else... checks.
Daniel Daranas
If performance is a issue, create an extra function "uncheckedAccess( ...)" then it is up to the user of the class. With a clear name it is very unlikely that he will fail to check the range.
TimW
Best to follow the STL's example, for consistency. operator[] does no bounds-checking for maximum performance, and at(int idx) does bounds-checking and throws an exception.
Tom
A: 

You could perhaps add an "out of bounds" exception to the [] operator (or at least an assert).

This should catch any problems, especially when debugging.

Alan
+1  A: 

The best method to achieve bounds checking would be to add an assert.

quint16 curvePoint::operator[](size_t index)
{
    assert(index < dimensions);
    return point[index];
}

If your code already depends on Boost libraries, you might want to use BOOST_ASSERT instead.

avakar
+2  A: 

Having an operator[] that never fails sounds nice, but may hide bugs later on, if a calling function uses an illegal offset, finds a value from the beginning of the buffer, and proceeds as if that is a valid value.

Pim
+7  A: 

Anyways in the implementation of operator[] I was wondering what the best method to achieve bounds checking is. I want to avoid throwing exceptions if at all possible, and the full range of values is usable for each number in the array so a special value to return in case of an out of bounds error is not possible either;

Then the remaining options are:

  • Flexible design. What you did. "Fix" the invalid input so that it tries to do something which makes sense. Advantage: Function won't crash. Disadvantage: Clueless callers who access the element one element will get a lie as a result. Imagine a 10-floor building with floors 1 to 10:

You: "Who lives in the 3rd floor?"

Me: "Mary".

You: "Who lives in the 9th floor?"

Me: "Joe".

You: "Who lives in the 1,203rd floor?"

Me: (Wait... 1,203 % 10 = 3...) > "Mary".

You: "Wow, Mary must enjoy great views from up there. So she owns two apartments then?"

  • A bool output parameter indicates success or failure. This option usually ends up in not very usable code. Many users will ignore the return code. You are still left with what you return in the other return value.

  • Design by Contract. Assert that the caller is within bounds. (For a practical approach in C++, see An exception or a bug? by Miro Samek or Simple Support for Design by Contract in C++ by Pedro Guerreiro.)

  • Return a System.Nullable<quint16>. Oops, wait, this is not C#. Well, you could return a pointer to a quint16. This of course has lots of implications which I shall not discuss here and which probably make this option not usable.

My favorite choices are:

  • For a publicly released library: Input will be checked and an exception will be thrown. You ruled out this option, so it is not an option for you. It is still my choice for a publicly released library.
  • For internal code: Design by contract.
Daniel Daranas
I agree with your post, except for the "design by contract" part. DbC is great as a feature implemented in languages (in the same way that strong-typed, compiled languages create some safeguards when correctly used). However, since most languages don't implement it (and that's a shame), I usually tends to take the defensive approach : "every input is potentially wrong, so you better check they aren't". Sure you know your own code and how to use it properly. But the guy working on the same project, or the one who will replace you probably won't.
Ksempac
@Ksempac: DbC Does _not_ take the defensive approach, instead, it takes an agressive one. My function _requires_ that you send a parameter inside bounds. If you want to violate this precondition, go ahead and have fun. In release mode, I'm not responsible for anything that happens. In debug mode of course preconditions will be checked and you will detect your calling error.
Daniel Daranas
+1 for DbC, asserts can be used for DbC in c++ although they are not as nice as Eiffle's support or the features offered by JML for java
iain
A: 

Unless I'm drastically misunderstanding something,

return point[ index % dimensions ];

is not bounds checking at all. It's returning a real value from a totally different part of the line, which will make it much harder to detect bugs.

I would either:

  1. Throw an exception or an assertion (though you said you don't want to do so)
  2. Simply dereference point past the array in a "natural" way (i.e. just skip any internal checking). The advantage over the % is that they're more likely (though undefined is undefined) to get "weird" values and/or an access violation

In the end, the caller is violating your pre-conditions, and you can do whatever you please. But I think these are the most reasonable options.

Also consider what Cătălin said about incorporating built-in STL collections if that's reasonable.

Matthew Flaschen
A: 

Your solution would be nice if you were providing access to the points of an elliptic shape. But it will lead to very nasty bugs if you use it for arbitrary geometric functions, because you knowingly provide a false value.

soulmerge
These points are points in some arbitrarily high dimensional space, I then use the curveCalculate function class to transform those Cartesian coordinates into the 1 dimensional index along a given space filling curve that represents those coordinates.
James Matta
More precisely the class contains a single point. Sorry I didn't make that clear.
James Matta
+10  A: 

The easiest solution is to do as C++ itself does. This limits the amount of surprises that your users will experience.

C++ itself is fairly consistent. Both the built-in [] on pointers and std::vector::operator[] have undefined behavior if you use an out-of-bound array index. If you want bounds checking, be explicit and use std::vector::at

Hence, if you do the same for your class, you can document the out-of-bound behavior as "standard".

MSalters
Interesting point.
Daniel Daranas
This option seems sensible to me, and I am pretty sure that within my multidimensional database code I will be highly careful in checking bounds so I could probably get away with this (especially because no one outside the database code has any excuse for using the curvePoint type).Thanks for the idea. Now it is a toss up in my head between this answer and the answer I wrote immediately after this one.
James Matta
Although this is the standard behaviour, there's nothing to stop an implementation from offering (optional) bounds checking in other places, including [] overloads. And if these are available on your platform, it would seem more sane to switch them on than to switch them off - only bypass them if profiling shows that it is really worth it. So I'd advise you to bounds check everywhere you have the opportunity to do so.
Daniel Earwicker
@ Earwicker: But bounds checking is slower than unchecked access, and in places where you already *know* your bounds are sane, you wouldn't want to invest the extra clock cycles for safe access. That's why C++ has two ways of accessing an element, operator[] and at(). I'm 100% with MSalters.
DevSolar
That's the nice bit about "undefined behavior " - it includes allows checking for bounds, or doing so only in debug mode, or...But consider the `return point[ index % dimensions ];` example - he already gets the "platform" behavior of the underlying operator[]. Any bounds checking you add is on top of that.
MSalters
A: 

Thanks to the comment on the C# feature in the post of Daniel Daranas I have managed to figure out a possible solution. As I stated in my question I am using the Qt libraries. There for I can use QVariant. QVariant can be set to an invalid state that can be checked by the function receiving it. So the code would become something like:

QVariant curvePoint::operator[](size_t index){
    QVariant temp;
    if(index > dimensions){
        temp = QVariant(QVariant::Invalid);
    }
    else{
        temp = QVariant(point[index]);
    }

    return temp;
}

Of course this has the potential of inserting a bit of gnarly overhead into the function so another possibility is to use a pair template.

std::pair<quint16, bool> curvePoint::operator[](size_t index){
    std::pair<quint16, bool> temp;
    if(index > dimensions){
        temp.second = false;
    }
    else{
        temp.second = true;
        temp.first = point[index];
    }
    return temp;
}

Or I could use a QPair, which has exactly the same functionality and would make it so that the STL need not be linked in.

James Matta
+1  A: 

If I were you I would follow the example set by the stl.

In this case std::vector supplies two methods: at which is bounds checked and operator[] which is not. This allows the client to decide with version to use. I would definitely not use the % size(), as this just hides the bug. However bounds checking will add a lot of overhead for when iterating over a large collection, this is why it should be optional. Although I agree with other posters that the assert is a very good idea as this will only cause a performance hit in debug builds.

You should also consider returning references and supplying const and not const versions. Here are the function declarations for std::vector:

reference at(size_type _Pos);
const_reference at(size_type _Pos) const;

reference operator[](size_type _Pos);
const_reference operator[](size_type _Pos) const;

As a good rule of thumb if I am not sure how to specify an API I look for examples of how others specify similar APIs. Also when ever I use an API I try to judge or rate it, find the bits I like and dislike.

iain
A: 

The modulo operator works surprisingly well for array indices -- it also implements negative indices (ie. point[-3] = point[dimensions - 3]). This is easy to work with, so I'd personally recommend the modulo operator as long as it's well-documented.

Azrael
A: 

Another option is to let the caller choose the out-of-bounds policy. Consider:

template <class OutOfBoundsPolicy>
quint16 curvePoint::operator[](size_t index)
{
    index = OutOfBoundsPolicy(index, dimensions);
    return point[index];
}

Then you could define several policies that the caller may choose. For example:

struct NoBoundsCheck {
    size_t operator()(size_t index, size_t /* max */) {
        return index;
    }
};

struct WrapAroundIfOutOfBounds {
    size_t operator()(size_t index, size_t max) {
        return index % max;
    }
};

struct AssertIfOutOfBounds {
    size_t operator()(size_t index, size_t max) {
        assert(index < max);
        return index % max;
    }
};

struct ThrowIfOutOfBounds {
    size_t operator()(size_t index, size_t max) {
        if (index >= max) throw std::domain_error;
        return index;
    }
};

struct ClampIfOutOfBounds {
    size_t operator()(size_t index, size_t max) {
        if (index >= max) index = max - 1;
        return index;
    }
};
Adrian McCarthy