views:

521

answers:

3

Problem

I have timestamped data, which I need to search based on the timestamp in order to get the one existing timestamp which matches my input timestamp the closest.
Preferably this should be solved with the STL. boost::* or stl::tr1::* (from VS9 with Featurepack) are also possible.
Example of timestamped data:

struct STimestampedData
{
 time_t m_timestamp; // Sorting criterion
 CData m_data;       // Payload
}

Approach with stl::vector, sort() and equal_range()

Since a map or set only allows me to find exact matches, I don't get any further using one of these. So now I have a vector to which I append data as it is coming in. Before searching I use <algorithm>'s sort() and supply it with a custom comparison function.
After that I use <algorithm>'s equal_range() to find the two neighbors of a specified value x. From these two values I check which one is closest to x and then I have my best match.


While this is not too complex, I wonder if there are more elegant solutions to this.
Maybe the STL already has an algorithm which does exactly that so I'm not re-inventing something here?

Update: Linear vs. binary search

I forgot to mention that I have quite a lot of data to handle so I don't want to have to search linearly.
The reason I am sorting a vector with sort() is because it has random access iterators which is not the case with a map. Using a map would not allow equal_range() to do a search with twice logarithmic complexity.
Am I correct?

A: 

Depending on what your usage is, you could do a simple linear search instead of a sort. Come up with a "distance" function, loop through keeping track of the best match so far, and its distance. When you find a better match, forget the previous one, and keep the new one and its distance. When you've looped through everything, you have your match.

This works out to be O(N*S) where N is the number of items in the vector and S is the number of searches.

Your current way is O((N+S)*LogN) which is greater if the number of searches is small and bounded. Otherwise the sort / binary search is better.

Eclipse
+7  A: 

I would use equal_range too for such a thing.

If you are using sort() every time on your vector it might be better to use a map (or set), as that's always sorted automatically, and use the member equal_range

But that depends on the the amount of inserts / queries / amount of data. (although for something that always needs to be sorted when I query, a map would be my first choice, and I'd only use a vector if there was a very good reason)

Pieter
+4  A: 

I would use set::lower_bound to find the matching or greater value, then decrement the iterator to check the next lower value. You should use std::set rather than std::map since your key is embedded in the object - you'll need to provide a functor that compares the timestamp members.

struct TimestampCompare
{
    bool operator()(const STimestampedData & left, const STimestampedData & right) const
    {
        return left.m_timestamp < right.m_timestamp;
    }
};
typedef std::set<STimestampedData,TimestampCompare> TimestampedDataSet;

TimestampedDataSet::iterator FindClosest(TimestampedDataSet & data, STimestampedData & searchkey)
{
    TimestampedDataSet::iterator upper = data.lower_bound(searchkey);
    if (upper == data.begin() || upper->m_timestamp == searchkey.m_timestamp)
        return upper;
    TimestampedDataSet::iterator lower = upper;
    --lower;
    if (upper == data.end() ||
            (searchkey.m_timestamp - lower->m_timestamp) < (upper->m_timestamp - searchkey.m_timestamp))
        return lower;
    return upper;
}
Mark Ransom
Great, this is quite exactly what I had in mind. Thanks!
mxp