views:

96

answers:

5

Consider a class of type doubles

class path_cost {
   double length;
   double time;
};

If I want to lexicographically order a list of path_costs, I have a problem. Read on :)

If I use exact equal for the equality test like so

bool operator<(const path_cost& rhs) const {
   if (length == rhs.length) return time < rhs.time;
   return length < rhs.length;
}

the resulting order is likely to be wrong, because a small deviation (e.g. due to numerical inaccuracies in the calculation of the length) may cause the length test to fail, so that e.g.

{ 231.00000000000001, 40 } < { 231.00000000000002, 10 }

erroneously holds.

If I alternatively use a tolerance like so

bool operator<(const path_cost& rhs) const {
   if (std::fabs(length-rhs.length)<1-e6)) return time < rhs.time;
   return length < rhs.length;
}

then the sorting algorithm may horribly fail since the <-operator is no longer transitive (that is, if a < b and b < c then a < c may not hold)

Any ideas? Solutions? I have thought about partitioning the real line, so that numbers within each partition is considered equal, but that still leaves too many cases where the equality test fails but should not.

(UPDATE by James Curran, hopefully explaining the problem): Given the numbers:

  • A = {231.0000001200, 10}
  • B = {231.0000000500, 40}
  • C = {231.0000000100, 60}

    • A.Length & B.Length differ by 7-e7, so we use time, and A < B.
    • B.Length & C.Length differ by 4-e7, so we use time, and B < C.
    • A.Length & C.Length differ by 1.1-e6, so we use length, and A > C.

(Update by Esben Mose Hansen) This is not purely theoretical. The standard sort algorithms tends to crash or worse when given a non-transitive sort operator. And this is exactly what I been contending with (and boy was that fun to debug ;) )

A: 

I'm not familiar with your application, but I'd be willing to bet that the differences in distance between points in your graph are many orders of magnitude larger than the rounding errors on floating point numbers. Therefore, if two entries differ by only the round-off error, they are essentially the same, and it makes no difference in which order they appear in your list. From a common-sense perspective, I see no reason to worry.

Carl Smotricz
As I wrote above, the problem is that when the transitivity fails, common sort algorithms are no longer well-defined, and in my case, they simply crash.
Esben Mose Hansen
You don't get that kind of failure if you don't try to adjust the numbers. If they are consistently off by tiny amounts, they will be consistently - and correctly - sorted. There is nothing mysterious about floating point numbers, they don't change their values in mid-comparison. Summary: Don't mess with the numbers and you won't have a transitivity problem.
Carl Smotricz
I did not adjust any numbers, so that was not the problem. As I recall, the problem was that a sorted range was inferred to contain a value, but actually didn't. Thus it ended up running off the end of the range, and into unallocated memory=>crash.You will note that e.g. std::sort requires the sorting operator to be a "strict weak ordering", which implies transitivity.
Esben Mose Hansen
A: 

You will never get 100% precision with ordinary doubles. You say that you are afraid that using tolerances will affect the correctness of your program. Have you actually tested this? What level of precision does your program actually need?

In most common applications I find a tolerance of something like 1e-9 suffices. Of course it all depends on your application. You can estimate the level of accuracy you need and just set the tolerance to an acceptable value.

If even that fails, it means that double is simply inadequate for your purposes. This scenario is highly unlikely, but can arise if you need very high precision calculations. In that case you have to use an arbitrary precision package (e.g. BigDecimal in Java or something like GMP for C). Again, only choose this option when there is no other way.

MAK
I have tried it, and it fails. The problem is that, with a certain probability, the transitivity fails, and the application segfaults. I have a workaround (involving using long double in an intermediate result), but I was wondering about the general problem. Surely, someone have looked at this problem before?The tolerance is pretty close to what we use. The numbers comes in partly from a label-setting/A* algoritm, partly from coin-or LP solution.
Esben Mose Hansen
+1  A: 

I don't think you are going to be able to do what you want. Essentially you seem to be saying that in certain cases you want to ignore the fact that a>b and pretend a=b. I'm pretty sure that you can construct a proof that says if a and b are equivalent when the difference is smaller than a certain value then a and b are equivalent for all values of a and b. Something along the lines of:

For a tolerance of C and two numbers A and B where without loss of generality A > B then there exist D(n) = B+n*(C/10) where 0<=n<=(10*(A-B))/(C) such that trivially D(n) is within the tolerance of D(n-1) and D(n+1) and therefore equivalent to them. Also D(0) is B and D((10*(A-B))/(C))=A so A and B can be said to be equivalent.

I think the only way you can solve that problem is using a partitioning method. Something like multiplying by 10^6 and then converting to an int shoudl partition pretty well but will mean that if you have 1.00001*10^-6 and 0.999999*10^-6 then they will come out in different partitions which may not be desired.

The problem then becomes looking at your data to work out how to best partition it which I can't help with since I don't know anything about your data. :)

P.S. Do the algorithms actually crash when given the algorithm or just when they encounter specific unsolvable cases?

Chris
The sorting algorithm crashes very, very occasionally --- we only had the problem on one dataset and only when it was compiled with -O2 (that had to do with the 80-bit internal representation of the co-processor, we suspect). The crash happened because the algorithm went off one end of the list in a case where it should not have been possible due to sorting.
Esben Mose Hansen
That makes sense. I was wondering if there was some magic that enabled it to test the comparison function in some way to see if it was viable or not. I'd have been dead impressed if it did that. ;-) And thanks for an interesting question. I'd not really thought about this sort of thing before. :)
Chris
+2  A: 

I can think of two solutions.

You could carefully choose a sorting algorithm that does not fail when the comparisons are intransitive. For example, quicksort shouldn't fail, at least if you implement it yourself. (If you are worried about the worst case behavior of quicksort, you can first randomize the list, then sort it.)

Or you could extend your tolerance patch so that it becomes an equivalence relation and you restore transitivity. There are standard union-find algorithms to complete any relation to an equivalence relation. After applying union-find, you can replace the length in each equivalence class with a consensus value (such as the average, say) and then do the sort that you wanted to do. It feels a bit strange to doctor floating point numbers to prevent spurious reordering, but it should work.


Actually, Moron makes a good point. Instead of union and find, you can sort by length first, then link together neighbors that are within tolerance, then do a subsort within each group on the second key. That has the same outcome as my second suggestion, but it is a simpler implementation.

Greg Kuperberg
+3  A: 

Do you really want just a compare function?

Why don't you sort by length first, then group the pairs into what you think are the same length and then sort within each group by time?

Once sorted by length, you can apply whatever heuristic you need, to determine 'equality' of lengths, to do the grouping.

Moron
That is actually not a bad idea, simple as it is. It would need a bit of work to get just right, but it would definitely be possible to do.
Esben Mose Hansen