views:

106

answers:

4

I have a set of numbers to compare. Let's say that I have to get this comparison from a user. I have a choice of either asking him a question consisting of 2 numbers or 3 numbers of 4 numbers. For instance, I can ask any of the following questions:

  • Which of the numbers is greater? 2 OR 4
  • Which of the numbers is greater? 2 OR 3 OR 4
  • Which of the numbers if greater? 2 OR 3 OR 4 OR 5

My goal is to minimize the number of questions I ask the user in some combination that will ultimately give me an ordering of all the numbers in the set... For instance, if there were just 4 numbers {2,3,4,5}, I could've asked him the third type of question where I give him 4 numbers to compare. But in the ecosystem I am designing this for, the user gets annoyed with long questions so I would want to minimize the number of questions of this type that I would ask. So if each of the questions has a particular weight, I a trying to figure out a way to ultimately get the ordering of all numbers but at the same time pose the user with minimum trouble.

Is there a good way of solving this problem? Does anyone see this as belonging to a general class of problems or am I just making it too complex? Any suggestions?

A: 

You could present the questions through the algorithms of quicksort or mergesort. Minimize the number of comparisons and guarantee the correct result.

Matt
-1: But what elements do you pick when trying to take advantage of the fact that you can ask the user 4 comparisons at a time?
Chris H
Don't do 4 comparisons at a time. By doing such a comparison (such as 2,3,4,5) you only know that 5 is greater than 2, 3, and 4, but gain no ordering information. Just compare two for every question.EDIT: Just read your answer. Good idea.
Matt
by asking who is the greatest of 1,2,3,4,5 you know 5>4, 5>3, 5>2, 5>1....that's more information than just one.
Chris H
+1  A: 

You could just do a merge sort or quick sort algorithm where the comparison function is asking the user the two number version of the question, and then the most questions they'd have to answer is n log n.

Then the question is just, does the ability to ask which is the greatest of 3 or 4 reduce the number of questions you have to ask. My intuition says no but I'd be hard pressed to prove it.

Eric Warmenhoven
+2  A: 

If you are sorting the array then you generally sort in O(n log(n)) time. That means that you sometimes compare pairs of numbers more than once. If you ask 4 questions constantly then you get extra information that you can store.

When you get to a point in the sort where you ask a>b, then you may as well ask is a the largest of {b,c,d,e,...}. That way you never have to ask a>c, a>d, etc. If you store these known comparisons in a table then you never have to do them again. As for picking exactly what caparison sets to ask the user I'm not sure, and it's certainly a good question.

My intuition about which sets to put in the question to user user would be to look at how quick-sort works. It splits the data into two parts, and numbers that aren't in the same group are never compared, so that way you know what NOT to put in the set to ask the user. Use 4 numbers as long as possible until the quicksort gets down into groups that have fewer than 4 numbers.

Chris H
A very nice explanation... Thank you for that... On a related note to your second paragraph, my only thoughts here were, if at all in the algorithm you come to a point where you've used a number of size-4 questions to build different sets and are at a stage where you need to now merge them, how does one pick the size? That would be interesting to know because not all questions have the same weight... As you suggested, I will go through quick sort in more detail.. but if you have any more comments, I would greatly appreciate them...
Legend
I like this idea. +1
Matt
@Legend: I'm sorry I don't understand your comment, but it's late here. If you clarify I can check tomorrow. ie: What is " where you've used a number of size-4 questions to build different sets and are at a stage where you need to now merge them, how does one pick the size?" size of what? I'm saying just do that algorithm normally, but always check and update the table and use smart question sets. I only used quicksort as an example,
Chris H
Oh I see... understood... You can answer this tomorrow when you have time then... Just a minor thought is that the complexity is O(nlog n). If n is large, then I am sure I will end up annoying the user. I am perhaps ready to compromise some accuracy for some speedup... I am googling currently but I wonder if there are any probabilistic ways of doing this in less than O(nlog n)
Legend
+3  A: 

Let's experiment, would you.

1. Example

Let's look at {A,B,C,D} and sort it.

Solution 1: By sets

  • Greater of {A,B,C,D} -> B (thus B>A, B>C and B>D)
  • Greater of {A,C,D} -> D (thus D>A and D>C)
  • Greater of {A,C} -> A (thus A>C)

Total order [B,D,A,C]

Solution 2: By pairs

  • Greater between A and C -> A (thus A>C)
  • Greater between B and D -> B (thus B>D)
  • Greater between D and A -> D

Total order [B,D,A,C]

What's the catch ? Obviously it's more difficult by pairs, here I chose them so that the merge would be easy (none).

2. Remarks

a) Total ordering

The > only work so well with a total ordering: ie for two given elements A and B of a set either A>B or B>A. If none of those two relations hold, then A and B are equivalent.

The problem with the Solution 1 approach is that if you present the user with {A,B,C,D} and A and B are equivalent and greater than C and D... what's the answer supposed to be ?

b) Transitivity

The > relationship is transitive, meaning that if A>B and B>C then A>C. It's important to use this fact since you can therefore deduce relationships between two elements without ever asking the user.

c) What's the goal ?

Is the goal supposed to minimize the number of questions to the user or supposed to minimize the user's work ? Because obviously it's more difficult for a user to answer questions from the first solution...

3. Modeling

One can model the problem as a "graph" problem.

You begin with a set of nodes {A,B,C,D} which represent the values you want to test.

Sorting the set is equivalent to computing the minimum set of oriented edges that link these nodes so that given any two nodes, a path leads from one to the other. I do insist of minimum.

For example, if I have 2 edges: B>A and B>C, then if I discover than A>C I shall remove B>C because I can deduce it by transitivity. The important properties is that if no two nodes are equivalent, the cardinal of the resulting set of edges is the cardinal of the set of nodes minus 1. In my example (given 4 nodes) it would be 3.

An oracle (or an extremely lucky guy) would thus be able to only ask 3 questions, and yet build this graph... that's what we should strive for.

4. How to solve this problem ?

Okay, let's suppose that we have no 2 equivalent elements. This means that if A>B is false, then I can deduce that B>A...

For the representation of our little graph, let's take an array:

  A B C D  
D . > .      # . represent the unknown
C . >        
B >          # < and > have their usual meaning...
A 

Because of the symmetry we only need a triangle.

By counting the number . we can see the number of unknown relationships, the ideal solution is to get rid of all of those . while asking as few questions as possible to the user.

A good question is therefore one than remove as much . as possible in one swoop.

5. My question

And thus, I have another type of question:

Select the elements lower than "D" in the following {A,B,C}: _

This question feels better that the What is the greater... because I explicitly target those relationships I wish to know (D?A, D?B and D?C) while the greater guarantees that I will obtain as much relationships but I can't know which in advance.

I try to maximize the usefulness of the question

Of course, it's no leeway for laziness: it's still important to take transitivity into account while exploiting the results of the questions.

6. Exploring larger sets

With larger sets, you can group the elements, and then merge them later on, but the merge is always a messy operation: you will probably get answers you already knew. However it's a great practice for my little question:

Given 2 ordered (disjoint) sets: [A,B,C,D,...] and [R,S,T,U,...] let's see the 3 questions of the toolbox:

  1. Which is greater: A or R ? _
  2. Which is the greatest element of {A,B,R,S} ? _
  3. Which elements are greater than A in {R,S,T} ? _

  4. Gives 1 relationship

  5. Gives 2 relationships: 3 of which 1 is already known
  6. Gives 3 relationships

The third question asks for a more elaborate answer, but it is much more suitable for merge situations. In the merge case, it's as efficient as asking to sort the elements, since it asks precisely for the only 3 relationships we don't know in the board.

7. Conclusion

You now have 4 questions in your box:

  • Sort: Sort the following elements from greater to lower {A,B,C,D} ? _
  • Pivot: Which elements are greater than A in this set {B,C,D} ? _
  • Greater: Which element is the greater in this set {A,B,C,D} ? _
  • Pair: Which is the greater element among A and B ? _

(Pair can be regarded as a degenerated case of either Greater or Pivot)

Given that each question gives n-1 relationships for n nodes, you could try to gauge how much time it takes for a user to answer a question T(n) and then find the n that maximize T(n)/n ;)

Matthieu M.
Thanks for your patience... :) That was really helpful...
Legend
When the questions are interesting... :)
Matthieu M.