ansaurus

Question

Performance: vector of classes or a class containing vectors

Answer 1

+7 A:

As the implementation of both variants seems easy enough I would build both versions and profile them to find the fastest one.

Empirical data usually beats speculation.

lothar 2009-04-23 00:38:17

Answer 2

A:

As lothar says, you really should test it out. But to answer your last question, yes, cache misses will be a major concern here.

Also, it seems that your first implementation would run into load-hit-store stalls as coded, but I'm not sure how much of a problem that is on x86 (it's a big problem on XBox 360 and PS3).

Not Sure 2009-04-23 00:52:18

Answer 3

+2 A:

As a side issue: Currently, the find() in your innermost loop does a linear scan through all elements of classes[i].sumVectorC until it finds a matching value. If that vector contains many values, and you have no reason to believe that testVal appears near the start of the vector, then this will be slow -- consider using a container type with faster lookup instead (e.g. std::map or one of the nonstandard but commonly implemented hash_map types).

As a general guideline: consider algorithmic improvements before low-level implementation optimisation.

j_random_hacker 2009-04-23 08:49:28

Answer 4

+1 A:

It looks like optimizing the find() would be a big win (profile to know for sure). Depending on the various sizes, in addition to replacing the vector with another container, you could try sorting sumVectorC and using a binary search in the form of lower_bound. This will turn your linear search O(n) into O(log n).

KeithB 2009-04-23 13:15:34

Answer 5

A:

if you can guarrantee that std::numeric_limits<double>::infinity is not a possible value, ensuring that the arrays are sorted with a dummy infinite entry at the end and then manually coding the find so that the loop condition is a single test:

 array[i]<test_val

and then an equality test.

then you know that the average number of looked at values is (size()+1)/2 in the not found case. Of course if the search array changes very frequently then the issue of keeping it sorted is an issue.

of course you don't tell us much about sumVectorC or the rest of A for that matter, so it is hard to ascertain and give really good advice. For example if sumVectorC is never updates then it is probably possible to find an EXTREMELY cheap hash (eg cast ULL and bit extraction) that is perfect on the sumVectorC values that fits into double[8]. Then the overhead is bit extract and 1 comparison versus 3 or 6

Also if you have a bound on sumVectorC.size() that is reasonable(you mentioned 4 or 5 so this assumption seems not bad) you could consider using an aggregated array or even just a boost::array<double> and add your own dynamic size eg :

class AggregatedArray : public boost::array<double>{
   size_t _size;
   size_t size() const {
      return size;
   }
   ....
   push_back(..){...
   pop(){...
   resize(...){...
};

this gets rid of the extra cache line access to the allocated array data for sumVectorC.

In the case of sumVectorC very infrequently updating if finding a perfect hash (out of your class of hash algoithhms)is relatively cheap then you can incur that with profit when sumVectorC changes. These small lookups can be problematic and algorithmic complexity is frequently irrelevant - it is the constants that dominate. It is an engineering problem and not a theoretical one.

Unless you can guarantee that the small maps are in cache you can be almost be guaranteed that using a std::map will yield approximately 130% worse performance as pretty much each node in the tree will be in a separate cache line

Thus instead of accessing (4 times 1+1 times 2)/5 = 1.2 cache lines per search (the first 4 are in first cacheline, the 5th in the second cacheline, you will access (1 + 2 times 2 + 2 times 3) = 9/5) + 1 for the tree itself = 2.8 cachelines per search (the 1 being 1 node at the root, 2 nodes being children of the root, and the last 2 being grandchildren of the root, plus the tree itself)

So I would predict using a std::map to take 2.8/1.2 = 233% as long for a sumVectorC having 5 entries

This what I meant when I said: "It is an engineering problem and not a theoretical one."

2009-04-26 02:36:02

ansaurus

tags:

views:

answers:

Performance: vector of classes or a class containing vectors

Vector of classes

Class of vectors

Update

related questions