views:

143

answers:

5

This is a direct quote from the textbook, Invitation to Computer Science by G.Michael Scneider and Judith L.Gersting.

At the end of Section 3.4.2, we talked about the tradeoff between using sequential search on an unsorted list as opposed to sorting the list and then using binary search. If the list size is n=100,000 about how many worst-case searches must be done before the second alternative is better in terms of number of comparisons?

I don't really get what the question is asking for

Sequential search is of order (n) and binary is of order (lgn) which in any case lgn will always be less than n. And in this case n is already given so what am I supposed to find.

This is one of my homework assignment but I don't really know what to do. Could anyone explain the question in plain english for me.

Thanks

+3  A: 

The question is how to decide which approach to choose - to just use linear search or to sort and then use binary search.

If you only search a couple of times linear search is better - it is O(n), while sorting is already O(n*logn). If you search very often on the same collection sorting is better - searching multiple times can become O(n*n) but sorting and then searching with binary search is again O(n*logn) + NumberOfSearches*O(logn) which can be less or more than using linear search depending on how NumberOfSearches and n relate.

The task is to determine the exact value of NumberOfSearches (not the exact number, but a function of n) which will make one of the options preferable:

 NumberOfSearches * O(n) <> O(n*logn) + NumberOfSearches * O(logn)

don't forget that each O() can have a different constant value.

sharptooth
This equation can not be solved. `O(n)` could be any number of comparisons for given n. Same for the O(n*logn). Your intent is clear, but this notation is meaningless. (Just like you can't solve `O(n)=12` ) I'd recommend using different function names (ie, `S(n)` or `NumberOfOperationsForSorting(n)`). S(n) would be exact, something like: `S(n)=2*n*log2(n)+n`, giving a solvable equation.
Ishtar
@Ishtar: Yes, you're right. But that's not an equation - there's `<>` in the middle. And yes, your proposal can really bring closer to the right answer.
sharptooth
+7  A: 

and binary is of order (lgn) which in any case lgn will always be less than n
This is where you're wrong. In assignment, you're asked to consider the cost of sorting array too.

Obviously, if you need only one search, first approach is better than sorting array and doing binary search: n < n*logn + logn. And you're asked, how many searches you need for second approach to become more effective.

End of hint.

Nikita Rybak
Unlike sharptooth's answer, your formula ignores the possibility of different constants (and lower-effect terms) in the different O terms. Even O(log(n)) can be slower than O(n) for small n, and often will be in practice.
Christopher Creutzig
@Christopher I provide no formula, I only point out where OP has misunderstood the problem. You can obviously downvote me if you think I should've posted a full solution, just don't make a mistake about my intentions. After all, this is 'homework' tag.
Nikita Rybak
You may not have intended it as a solution, but a formula you did provide. And no, I don't think you should have provided a solution, but your answer simply left out a really important aspect.
Christopher Creutzig
A: 

The question is about appreciating the number NUM_SEARCHES needed to compensate the cost of sorting. So we'll have:

 time( NUM_SEARCHES * O(n) ) > time( NUM_SEARCHES * O(log(n)) + O(n* log(n)) )
ruslik
+1  A: 

The order of the methods is not important here. It tells you something how well algorithms scale when the problem becomes bigger and bigger. You can't do any exact calculations if you only know O(n) == it complexity grows linear in the size of the problem. It won't give you any numbers.

This can well mean that an algorithm with O(n) complexity is faster than a O(logn) algorithm, for some n. Because O(log(n)) scales better when it gets larger, we know for sure, there is a n (a problem size) where the algorithm with O(logn) complexity is faster. We just don't know when (for what n).

In plain english:

If you want to know 'how many searches', you need exact equations to solve, you need exact numbers. How many comparisons does it take to search sequential? (Remember n is given, so you can give a number.) How many comparisons (in the worst case!) does it take to search with a binary search? Before you can do a binary search, you have to sort. Let's add the number of comparisons needed to sort to the cost of binary search. Now compare the two numbers, which one is less?

The binary search is fast, but the sorting is slow. The sequential search is slower than binary search, but faster than sorting. However the sorting needs to be done only once, no matter how many times you search. So, when does one heavy sort outweigh having to do a slow (sequential) search every time?

Good luck!

Ishtar
A: 

Thank you guys. I think I get the point now. Could you take a look at my answer and see whether I'm on the right track.

For worst case searches Number of comparison for sequential search is n = 100,000. Number of comparison for binary search is lg(n) = 17. Number of comparison for sorting is (n-1)/2 * n = (99999)(50000). (I'm following my textbook and used the selection sort algorithm covered in my class)

So let p be the number of worst case searches, then 100,000p > (99999)(50000) + 17p
OR p > 50008

In conclusion, I need 50,008 worst case searches to make sorting and using binary search better than a sequential search for a list of n=100,000.

Essentially, you are pretending that O(f(n)) and f(n) is the same thing and an ill-posed question is forcing you to do so. It's sad to see that ignorant or lazy instructors are planting false ideas an confusion in the heads of young people. This might get you your credits, but it compromises your ability to understand and use asymptotic analysis. 1 is O(log n), as is 2 log(n) + log log (n), as is 10^8 log n + 10^10. All that f(n) is O(log n) tells you is an upper bound on f(n) of the form C log(n) is true for n > N. Only the existence of N and C is implied, not that they have certain values.
piccolbo
piccolbo