I am implementing a templated sparse_vector class. It's like a vector, but it only stores elements that are different from their default constructed value.
So, sparse_vector would store the lazily-sorted index-value pairs for all indices whose value is not T().
I am basing my implementation on existing sparse vectors in numeric libraries-- though mine will handle non-numeric types T as well. I looked at boost::numeric::ublas::coordinate_vector
and eigen::SparseVector
.
Both store:
size_t* indices_; // a dynamic array
T* values_; // a dynamic array
int size_;
int capacity_;
Why don't they simply use
vector<pair<size_t, T>> data_;
My main question is what are the pros and cons of both systems, and which is ultimately better?
The vector of pairs manages size_ and capacity_ for you, and simplifies the accompanying iterator classes; it also has one memory block instead of two, so it incurs half the reallocations, and might have better locality of reference.
The other solution might search more quickly since the cache lines fill up with only index data during a search. There might also be some alignment advantages if T is an 8-byte type?
It seems to me that vector of pairs is the better solution, yet both containers chose the other solution. Why?