To implement option 2, define an enumerated type for the serotypes within another class, then write a templated key extractor. I'm not certain that boost::multi_index_container
supports reindexing (I doubt it does), so this approach may be doomed from the start.
class ... {
enum Serotype {
...
sup // 1 greater than the highest serotype
};
...
};
template <class H, typename H::Serotype s>
struct serotype_key_extractor {
typedef bool result_type;
result_type operator()(const boost::shared_ptr<Host>& h) const
{ ! return h->isInfected(s); }
};
Of course, this requires an index for each serotype on your multi_index_container
(Serotype::sup
total), with a probable performance of O(I * lg(n)) for each index over the lifetime of the simulation (where I is the number of infection events and n is the number of hosts). If there are a one or two common queries, you could use composite keys, with the serotype key extractors one component.
Option 4, keeping track of infected hosts in a map of sets std::map<H::Serotype, std::set<boost::shared_ptr<Host> > >
, can be made more performant over infection events at the cost of querying. Insertion to/removal from a set might be O(lg(n)), so it might be better to use a hash or list rather than a set, though this will increase the cost of queries. Since you're running a simulation, keeping this structure synchronized isn't too bad. Update it during an infection (or cure, if they exist) event. If you're using event based programming, it's quite easy.
For querying with option 4, you could build other sets based on the other attributes, then find the intersection of those sets. This is O(f(I)+f(n)+g(S,T))
overall, where f(x)
is the cost of inserting x
elements, g(X,Y)
is the cost of intersecting sets and S
and T
are the size of the sets (the O(f(I)+f(N))
comes from building all the sets; I'm playing it a little loose with the timing, since I might be assuming f(x) is homomorphic, which it likely isn't). At best, f(x)=x
and g(X,Y)=X+Y
, depending on your set implementation. Alternatively, fold over one of the sets (e.g. find all the hosts of a particular age, then count infected; count_if
is a particular fold you could use, where count_if(start, end, pred)
is equivalent to foldl(start, end, [](int count, const boost::shared_ptr<Host>& h){return count + pred(h); })
). In the latter case, make use of the most restrictive index to reduce the sub-population you need to scan as much as possible. This is O(f(I)+S) overall.
Part of the difficulty in designing a performant solution is that you can really only make use of one independent index at a time. Within the groups from an index, other attributes you might index on are arranged essentially randomly. If two indices aren't independent (those that are geographically close are more likely to have the same infections), you might be able to make use of both simultaneously, though this doesn't gain you much: you can get most of the sub-population early in the query, but you'd still have to examine the rest of the group from the primary index.
If you're not married to C++, you could switch to C# and handle the queries with LINQ.
As for switching to a database, as mathmike suggests, you'd probably only want to use an object-oriented, in-memory database. Christopher Brown lists some, including FastDB and OOFile, though I can't say how suitable they are for your project.