views:

360

answers:

5

Hi, I'm trying to work out the best method to search a vector of type "Tracklet" (a class I have built myself) to find the first and last occurrence of a given value for one of its variables. For example, I have the following classes (simplified for this example):

class Tracklet {
    TimePoint *start;
    TimePoint *end;
    int angle;

    public:
        Tracklet(CvPoint*, CvPoint*, int, int);
}

class TimePoint {
    int x, y, t;

    public:
        TimePoint(int, int, int);
        TimePoint(CvPoint*, int);
        // Relevant getters and setters exist here   
};

I have a vector "vector<Tracklet> tracklets" and I need to search for any tracklets with a given value of "t" for the end timepoint. The vector is ordered in terms of end time (i.e. tracklet.end->t).

I'm happy to code up a search algorithm, but am unsure of which route to take with it. I'm not sure binary search would be suitable, as I seem to remember it won't necessarily find the first. I was thinking of a method where I use binary search to find an index of an element with the correct time, then iterate back to find the first and forward to find the last. I'm sure there's a better way than that, since it wastes binary searches O(log n) by iterating.

Hopefully that makes sense: I struggled to explain it a bit! Cheers!

+1  A: 

A binary search seems like your best option here, as long as your vector remains sorted. It's essentially identical, performance-wise, to performing a lookup in a binary tree-structure.

Charles Salvia
A: 

The vector is ordered in terms of time

The start time or the end time?

What is wrong with a naive O(n) search? Remember you are only searching and not sorting. You could use a sorted container as well (if that doesn't go against the basic design).

dirkgently
End time, sorry, will edit the question.The list is going to be very large (upwards of 2000 elements) and efficiency is very important (I need to process this approx. once a second, with as little program wait as possible), so I'm trying to minimise the search time as much as possible.
ianhales
What sort of efficiency: insert/delete/find? If it's Which means a std::set is better suited for this purpose (or even a hash_map) if you want to have quick lookup. I have had signifanct performance gains when moving from `vector` to `set` (where the size of the data set ~10K). Also, see http://www.bigzaphod.org/set.vs.vector/
dirkgently
See http://www.sgi.com/tech/stl/complexity.html and the SO question: http://stackoverflow.com/questions/181693/what-are-the-complexity-guarantees-of-the-standard-containers
dirkgently
+4  A: 

I'd just use find and find_end, and then do something more complicated only if testing showed it to be too slow.

If you're really concerned about lookup performance, you might consider a different data structure, like a map with timestamp as the key and a vector or list of elements as the value.

Kristopher Johnson
I'm concerned that the time complexity of find and find_end are quite high. The vector will be large, and I really need to minimise lag.
ianhales
+5  A: 

If the vector is sorted and contains the value, std::lower_bound will give you an iterator to the first element with a given value and std::upper_bound will give you an iterator to one element past the last one containing the value. Compare the value with the returned element to see if it existed in the vector. Both these functions use binary search, so time is O(logN).

To compare on tracklet.end->t, use:

bool compareTracklets(const Tracklet &tr1, const Tracklet &tr2) {
    return (tr1.end->t < tr2.end->t);
}

and pass compareTracklets as the fourth argument to lower_bound or upper_bound

interjay
That might be possible, but how would I make it check on `tracklet.end->t`? I'm relatively new to C++ :)
ianhales
You use a comparison function or object. I've edited my answer with an example.
interjay
+1  A: 

dirkgently referred to a sweet optimization comparative. But I would in fact not use a std::vector for this.

Usually, when deciding to use a STL container, I don't really consider the performance aspect, but I do consider its interface regarding the type of operation I wish to use.

std::set<T>::find
std::set<T>::lower_bound
std::set<T>::upper_bound
std::set<T>::equal_range

Really, if you want an ordered sequence, outside of a key/value setup, std::set is just easier to use than any other.

  • You don't have to worry about inserting at a 'bad' position
  • You don't have problems of iterators invalidation when adding / removing an element
  • You have built-in methods for searching

Of course, you also want your Comparison Predicate to really shine (hopes the compiler inlines the operator() implementation), in every case.

But really, if you are not convinced, try a build with a std::vector and manual insertion / searching (using the <algorithm> header) and try another build using std::set.

Compare the size of the implementations (number of lines of code), compare the number of bugs, compare the speed, and then decide.

Most often, the 'optimization' you aim for is actually a pessimization, and in those rares times it's not, it's just so complicated that it's not worth it.

Optimization:

  • Don't
  • Expert only: Don't, we mean it
Matthieu M.