Let's experiment, would you.
1. Example
Let's look at {A,B,C,D}
and sort it.
Solution 1: By sets
- Greater of
{A,B,C,D}
-> B
(thus B>A
, B>C
and B>D
)
- Greater of
{A,C,D}
-> D
(thus D>A
and D>C
)
- Greater of
{A,C}
-> A
(thus A>C
)
Total order [B,D,A,C]
Solution 2: By pairs
- Greater between
A
and C
-> A
(thus A>C
)
- Greater between
B
and D
-> B
(thus B>D
)
- Greater between
D
and A
-> D
Total order [B,D,A,C]
What's the catch ? Obviously it's more difficult by pairs, here I chose them so that the merge would be easy (none).
2. Remarks
a) Total ordering
The >
only work so well with a total ordering: ie for two given elements A
and B
of a set either A>B
or B>A
. If none of those two relations hold, then A
and B
are equivalent.
The problem with the Solution 1 approach is that if you present the user with {A,B,C,D}
and A
and B
are equivalent and greater than C
and D
... what's the answer supposed to be ?
b) Transitivity
The >
relationship is transitive, meaning that if A>B
and B>C
then A>C
. It's important to use this fact since you can therefore deduce relationships between two elements without ever asking the user.
c) What's the goal ?
Is the goal supposed to minimize the number of questions to the user or supposed to minimize the user's work ? Because obviously it's more difficult for a user to answer questions from the first solution...
3. Modeling
One can model the problem as a "graph" problem.
You begin with a set of nodes {A,B,C,D}
which represent the values you want to test.
Sorting the set is equivalent to computing the minimum set of oriented edges that link these nodes so that given any two nodes, a path leads from one to the other. I do insist of minimum.
For example, if I have 2 edges: B>A
and B>C
, then if I discover than A>C
I shall remove B>C
because I can deduce it by transitivity. The important properties is that if no two nodes are equivalent, the cardinal of the resulting set of edges is the cardinal of the set of nodes minus 1. In my example (given 4 nodes) it would be 3.
An oracle (or an extremely lucky guy) would thus be able to only ask 3 questions, and yet build this graph... that's what we should strive for.
4. How to solve this problem ?
Okay, let's suppose that we have no 2 equivalent elements. This means that if A>B
is false, then I can deduce that B>A
...
For the representation of our little graph, let's take an array:
A B C D
D . > . # . represent the unknown
C . >
B > # < and > have their usual meaning...
A
Because of the symmetry we only need a triangle.
By counting the number .
we can see the number of unknown relationships, the ideal solution is to get rid of all of those .
while asking as few questions as possible to the user.
A good question is therefore one than remove as much .
as possible in one swoop.
5. My question
And thus, I have another type of question:
Select the elements lower than "D" in the following {A,B,C}: _
This question feels better that the What is the greater... because I explicitly target those relationships I wish to know (D?A
, D?B
and D?C
) while the greater guarantees that I will obtain as much relationships but I can't know which in advance.
I try to maximize the usefulness of the question
Of course, it's no leeway for laziness: it's still important to take transitivity into account while exploiting the results of the questions.
6. Exploring larger sets
With larger sets, you can group the elements, and then merge them later on, but the merge is always a messy operation: you will probably get answers you already knew. However it's a great practice for my little question:
Given 2 ordered (disjoint) sets: [A,B,C,D,...]
and [R,S,T,U,...]
let's see the 3 questions of the toolbox:
- Which is greater:
A
or R
? _
- Which is the greatest element of
{A,B,R,S}
? _
Which elements are greater than A
in {R,S,T}
? _
Gives 1 relationship
- Gives 2 relationships: 3 of which 1 is already known
- Gives 3 relationships
The third question asks for a more elaborate answer, but it is much more suitable for merge situations. In the merge case, it's as efficient as asking to sort the elements, since it asks precisely for the only 3 relationships we don't know in the board.
7. Conclusion
You now have 4 questions in your box:
- Sort: Sort the following elements from greater to lower {A,B,C,D} ? _
- Pivot: Which elements are greater than A in this set {B,C,D} ? _
- Greater: Which element is the greater in this set {A,B,C,D} ? _
- Pair: Which is the greater element among A and B ? _
(Pair can be regarded as a degenerated case of either Greater or Pivot)
Given that each question gives n-1
relationships for n
nodes, you could try to gauge how much time it takes for a user to answer a question T(n)
and then find the n
that maximize T(n)/n
;)