views:

75

answers:

2

I am implementing a templated sparse_vector class. It's like a vector, but it only stores elements that are different from their default constructed value.

So, sparse_vector would store the lazily-sorted index-value pairs for all indices whose value is not T().

I am basing my implementation on existing sparse vectors in numeric libraries-- though mine will handle non-numeric types T as well. I looked at boost::numeric::ublas::coordinate_vector and eigen::SparseVector.

Both store:

size_t* indices_;  // a dynamic array
T* values_;  // a dynamic array 
int size_;
int capacity_;

Why don't they simply use

vector<pair<size_t, T>> data_;

My main question is what are the pros and cons of both systems, and which is ultimately better?

The vector of pairs manages size_ and capacity_ for you, and simplifies the accompanying iterator classes; it also has one memory block instead of two, so it incurs half the reallocations, and might have better locality of reference.

The other solution might search more quickly since the cache lines fill up with only index data during a search. There might also be some alignment advantages if T is an 8-byte type?

It seems to me that vector of pairs is the better solution, yet both containers chose the other solution. Why?

+1  A: 

Having indices in a separate list would make them faster to look up - as you suggest, it would use the cache more effectively, particularly if T is large.

If you want to implement your own, why not just use std::map (or std::unordered_map)? Keys would be larger but implementation time would be close to zero!

Matt Curtis
+1  A: 

Effectively, it seems that they reinvented the wheel (so to speak).

I would personally consider 2 libraries for your need:

  • Loki, for Loki::AssocVector -> the interface of a map implemented over a vector (which is what you wish to do)
  • Boost.Iterator, for its iterator_adaptor class. Makes it very easy to implement a new container by Composition.

As a remark, I would note that you may wish to be a little more generic that values different from the T() because this impose T to be DefaultConstructible. You could provide a constructor which takes a T const&. When writing a generic container it is good to try and reduce the necessary requirements as much as possible (as long as it does not hurt performance).

Also, I would remind you that the idea of using a vector for storage is very good for a little number of values, but you might wish to change the underlying container toward a classic map or unordered_map if the number of values grows. It could be worth profiling/timing. Note that the STL offer this ability with the Container Adapters like stack, even though it could make implementation slightly harder.

Have fun.

Matthieu M.
thanks for pointing out iterator_adaptor -- did not know about it.
Neil G