views:

511

answers:

8

I have a set of time stamped values I'd like to place in a sorted set.

public class TimedValue {
    public Date time;
    public double value;

    public TimedValue(Date time, double value) {
        this.time = time;
        this.value = value;
    }
}

The business logic for sorting this set says that values must be ordered in descending value order, unless it's more than 7 days older than the newest value.

So as a test, I came up with the following code...

DateFormat dateFormatter = new SimpleDateFormat("MM/dd/yyyy");
TreeSet<TimedValue> mySet = new TreeSet<TimedValue>(new DateAwareComparator());
mySet.add(new TimedValue(dateFormatter.parse("01/01/2009"), 4.0 )); // too old
mySet.add(new TimedValue(dateFormatter.parse("01/03/2009"), 3.0)); // Most relevant
mySet.add(new TimedValue(dateFormatter.parse("01/09/2009"), 2.0));

As you can see, initially the first value is more relevant than the second, but once the final value is added to the set, the first value has expired and should be the least relevant.

My initial tests say that this should work... that the TreeSet will dynamically reorder the entire list as more values are added.

But even though I see it, I'm not sure I believe it.

Will a sorted collection reorder the entire set as each element is added? Are there any gotchas to using a sorted collection in this manner (i.e performance)? Would it be better to manually sort the list after all values have been added (I'm guessing it would be)?



Follow-up: As many (and even I to a certain extent) suspected, the sorted collection does not support this manner of "dynamic reordering". I believe my initial test was "working" quite by accident. As I added more elements to the set, the "order" broke down quite rapidly. Thanks for all the great responses, I refactored my code to use approaches suggested by many of you.

+3  A: 

I don't believe the JDK libraries or even 3rd party libraries are written to handle a comparator whose results are not consistent. I wouldn't depend on this working. I would worry more if your Comparator can return not-equal for two values when called one time and can return equal for the same two values if called later.

Read carefully the contract of Comparator.compare(). Does your Comparator satisfy those constraints?

To elaborate, if your Comparator returns that two values are not equals when you call it once, but then later returns that the two values are equal because a later value was added to the set and has changed the output of the Comparator, the definition of "Set" (no duplicates) becomes undone.

Jon Skeet's advice in his answer is excellent advice and will avoid the need to worry about these sorts of problems. Truly, if your Comparator does not return values consistent with equals() then you can have big problems. Whether or not a sorted set will re-sort each time you add something, I wouldn't depend on, but the worst thing that would occur from order changing is your set would not remain sorted.

Eddie
As far as I can see, the contract doesn't specify that the ordering imposed by the comparator has to stay constant over time. But I can't imagine that you could get away with it in the general case.
JesperE
I am most concerned with whether or not the Comparator is consistent with equals(). If not, chaos may occur.
Eddie
+2  A: 

I am 99% certain this will not work. If a value in the Set suddenly changes its comparison behaviour, it is possible (quite likely, actually) that it will not be found anymore; i.e. set.contains(value) will return false, because the search algorithm will at one point do a comparison and continue in the wrong subtree because that comparison now returns a different result than it did when the value was inserted.

Michael Borgwardt
+1  A: 

I think the non-changing nature of a Comparator is supposed to be on a per-sort basis, so as long as you are consistent for the duration of a given sorting operation, you are ok (so long as none of the items cross the 7 day boundary mid-sort).

However, you might want to make it more obvious that you are asking specifically about a TreeSet, which I imagine re-uses information from previous sorts to save time when you add a new item so this is a bit of a special case. The TreeSet javadocs specifically defer to the Comparator semantics, so you are probably not officially supported, but you'd have to read the code to get a good idea of whether or not you are safe.

I think you'd be better off doing a complete sort when you need the data sorted, using a single time as "now" so that you don't risk jumping that boundary if your sort takes long enough to make it likely.

Jeremy Huiskamp
+10  A: 

I don't see how your comparator can even detect the change, unless it remembers the newest value it's currently seen - and that sounds like an approach which is bound to end in tears.

I suggest you do something along the following lines:

  • Collect your data in an unordered set (or list)
  • Find the newest value
  • Create a comparator based on that value, such that all comparisons using that comparator will be fixed (i.e. it will never return a different result based on the same input values; the comparator itself is immutable although it depends on the value originally provided in the constructor)
  • Create a sorted collection using that comparator (in whatever way seems best depending on what you then want to do with it)
Jon Skeet
+2  A: 

No, this won't work.

If you are using comparable keys in a collection, the results of the comparison between two keys must remain the same over time.

When storing keys in a binary tree, each fork in the path is chosen as the result of the comparison operation. If a later comparison returns a different result, a different fork will be taken, and the previously stored key will not be found.

erickson
+4  A: 

I would advise against this for a few reasons:

  1. Since it's basically a red-black tree behind the scenes (which doesn't necessarily have to be rebuilt from scratch on every insertion), you might easily end up with values in the wrong part of the tree (invalidating most of the TreeSet API).
  2. The behavior is not defined in the spec, and thus may change later even if it's working now.
  3. In the future, when anything goes strangely wrong in anything remotely touching this code, you'll spend time suspecting that this is the cause.

I would recommend either recreating/resorting the TreeSet before searching it or (my preference) iterating through the set before the search and removing any of the objects that are too old. You could even, if you wanted to trade some memory for speed, keep a second list ordered by date and backed by the same objects so that you all you would have to do to filter your TreeSet is remove objects from the TreeSet based on the time-sorted list.

Austin Mills
+1 for pointing out that it's not specified and could thus change at any time in the future.
Eddie
+1  A: 

It's possible that a record will change from <7 days to >7 days mid-sort, so what you're doing violates the rules for a comparator. Of course this doesn't mean it won't work: lots of things that are documented as "unpredictable" in fact work if you know exactly what is happening internally.

I think the textbook answer is: This is not reliable with the built-in sorts. You would have to write your own sort function.

At the very least, I would say that you can't rely on a TreeSet or any "sorted structure" magically resorting itself when dates roll over the boundary. At best this might work if you re-sort just before displaying, and don't rely on anything remaining correct between updates.

At worst, inconsistent comparisons might break the sorts badly. You have no assurance that this won't put you into an infinite loop or some other deadly black hole.

So I'd say: Read the source code from Sun for whatever classes or functions you plan to use, and see if you can figure out what will happen. Testing is good, but there are potentially tricky cases that are difficult to test. THe most obvious is: What if while it's in the process of sorting, a record rolls over the date boundary? That is, it might look at a record once and say it's <7 but the next time it sees it it's >7. That could be bad, bad news.

One obvious trick that occurs to me: Convert the date to an age at the time you add the record to the structure, rather than dynamically. That way it can't change within the sort. If the structure is going to live for more than a few minutes, recalculate the ages at some appropriate time and then re-sort. I doubt someone will say your program is incorrect because you said a record was less than 7 days old when really it's 7 days, 0 hours, 0 minutes, and 2 seconds old. Even if someone noticed, how accurate is their watch?

Jay
+1  A: 

As already noted, the Comparator cannot do this for you, because the transitivity is violated. Basically, in order to be able to sort the items, you must be able to compare each two of them (independent of the rest), which obviously you cannot do. So your scenario basically either would not work or would produce not consistent result.

Maybe something simpler would be good enough for you:

  • apply simple Comparator that uses the Value as you need
  • and simply remove from your list/collection all elements that are 7 days older then the newest. Basically whenever a new item is added, you check if it is the newest, and if it is, remove those that are 7 days older then this.

This would not work if you also remove the items from the list, in which case you would need to keep all those you removed in a separate list (which by the way you would sort by date) and add those back to the original list in case the MAX(date) is smaller after removal.

van