I am writing an abstract matrix class (and some concrete subclasses) for use on very differing hardwares/architectures, etc. and I want to write a row and column type that provides a transparent reference to the rows and columns of the matrix.
However, I want to tune for performance, so I'd like this class to be essentially a compiler construct. In other words, I'm willing to sacrifice some dev time to making the overhead of these classes as small as possible.
I assume all (small) methods would want to be inline? Keep the structure small? Any other suggestions?