views:

888

answers:

5

I have a

List<Cat>

sorted by the cats' birthdays. Is there an efficient Java Collections way of finding all the cats that were born on January 24th, 1983? Or, what is a good approach in general?

+6  A: 

Collections.binarySearch().

Assuming the cats are sorted by birthday, this will give the index of one of the cats with the correct birthday. From there, you can iterate backwards and forwards until you hit one with a different birthday.

If the list is long and/or not many cats share a birthday, this should be a significant win over straight iteration.

Here's the sort of code I'm thinking of. Note that I'm assuming a random-access list; for a linked list, you're pretty much stuck with iteration. (Thanks to fred-o for pointing this out in the comments.)

List<Cat> cats = ...; // sorted by birthday
List<Cat> catsWithSameBirthday = new ArrayList<Cat>();
Cat key = new Cat();
key.setBirthday(...);
final int index = Collections.binarySearch(cats, key);
if (index < 0)
    return catsWithSameBirthday;
catsWithSameBirthday.add(cats.get(index));
// go backwards
for (int i = index-1; i > 0; i--) {
    if (cats.get(tmpIndex).getBirthday().equals(key.getBirthday()))
        catsWithSameBirthday.add(cats.get(tmpIndex));
    else
        break;
}
// go forwards
for (int i = index+1; i < cats.size(); i++) {
    if (cats.get(tmpIndex).getBirthday().equals(key.getBirthday()))
        catsWithSameBirthday.add(cats.get(tmpIndex));
    else
        break;
}
return catsWithSameBirthday;
Michael Myers
Collections.binarySearch() return a single element and makes no guarantees about elements which are considered identical.
Jake
Maybe that will teach me to read the question before answering. :)
Michael Myers
Also, Collections.binarySearch() is only efficient for random access lists.
fred-o
Must have index < 0, but yeah: this is the general idea.
Jake
@Jake: Good point, that wouldn't have worked very well. I fixed the code.
Michael Myers
+5  A: 

Binary search is the classic way to go.

Clarification: I said you use binary search. Not a single method specifically. The algorithm is:

//pseudocode:

index = binarySearchToFindTheIndex(date);
if (index < 0) 
  // not found

start = index;
for (; start >= 0 && cats[start].date == date; --start);
end = index;
for (; end < cats.length && cats[end].date == date; ++end);

return cats[ start .. end ];
Mehrdad Afshari
Collections.binarySearch() return a single element and makes no guarantees about elements which are considered identical.
Jake
I didn't say you should use `Collections.binarySearch` method. Binary search to find the index of a single element. All other elements with equal birthdays are beside the element found. You can get all of them with a single loop. It's a classic.
Mehrdad Afshari
@Mehrdad - update the answer to reflect this and you'll earn an upvote.
slim
A: 

Unless you somehow indexed the collection by date, the only way would be to iterate over all of them

Nuno Furtado
It's sorted by date. What else would you possibly call an index!?
Mehrdad Afshari
I doubt that is the *only* way. Certainly you can imagine a "low level" algorithm that does this very efficiently by finding the first occurrence of Cats with a given birthday and proceeding linearly from there.
Jake
Jake: whatever algorithm you find first element will be O(n) unless you're doing some binary search. Asymptotically fastest algorithm is to binary search and linearly find the start and end after that.
Mehrdad Afshari
A: 

If you need a really fast search use a HashMap with the birthday as a key. If you need to have the keys sorted use a TreeMap.

Because you want to allow multiple cats to have the same birthday, you need to use a Collection as a value in the Hast/TreeMap, e.g.

      Map<Date,Collection<Cat>>
siddhadev
A: 

Google Collections can do what you want by using a Predicate and creating a filtered collection where the predicate matches dates.

basszero
Does it filter the Collection in O(n)?
Jake
I assume it would have to since it would apply the predicate to each element. A binary search (like the upvoted answer) is best if the list is sorted.
basszero