tags:

views:

374

answers:

5

Is it better to sort then binary search or simply linear search?

Thanks

+6  A: 

No, because Quick Sort is slower than linear, and you add some more.

O(n*log(n)) + O(log(n)) = O(n*log(n)) > O(n)

Edit: Not precisely correct as pointed out, as worst case of Quick Sort is even slower with O(n^2). n*log(n) is best case and average case, which provides a lower bound is already slower than linear. The correct symbol might though be Θ instead of O.

Eiko
Is it valid to use Big O with comparison operators like that?
Nabb
Hmmm... Not sure I'd recommend that in an exam, but it's easy to read and understand.
Eiko
@Nabb: Sure, but they don't act quite like ordinary numbers. Recall that big-O is in the asymptotic limit of large n, so `O(log n) << O(n) << O(n*log n)` and you can simple neglect non-leading terms.
dmckee
@Nabb: It's fine as long as we're clear on what it means. It means that given two fixed functions f(n) and g(n) which are ϴ(n log n) and ϴ(n) respectively (not just O, which is an upper bound), then for *sufficiently large n*, f(n) > g(n). Of course, for smaller n it may be the other way around, and if the bounds are only upper bounds, then the function that is O(n log n) may actually be much smaller (in the case of sorting, we know it's not).
ShreevatsaR
Quick sort is not `O(n*log(n))`, it's `O(n^2)`.
R..
Indeed, worst case is O(n^2). Best and average is Θ(n*log(n)). I edit to use Θ instead of O.
Eiko
+20  A: 

It depends how often you want to search after the sort - if only once, then a linear search will probably be faster. Of course, an even better bet is normally (but not always) to maintain things in sorted order using something like set or a map.

anon
This is the best answer.
DeadMG
Depends on the size of the container. For small containers, a linear search is faster than a binary search.
Thomas Matthews
@Thomas It depends on lots of things.
anon
+1 - everyone knows (or should know) that O(n*log(n)) > O(n)... the real question is _how many linear searches do you have to do before the benefit of sorting kicks in?_
D.Shawley
How is a set sorted in any way?
Eiko
@Eiko In the same way that any binary tree is sorted.
anon
@Eiko: Most languages provide one or more implementations of set. They are either implemented as an ordered data structure (e.g. BST or Skiplist) or an unordered hashtable. Both options are equally applicable in this case IMO.
MAK
@MAK - The question tags list C and C++. C has no standard library called "set". C++ has one, and it is an ordered container. A C++ unordered set is called "unordered_set" - and even that's TR1, which isn't strictly standard IIRC.
Steve314
why "probably" ? a sort must at least read the whole list ...
6502
@6502 Because it depends on the specific implementations of the sort, the binary chop search and the sequential search.
anon
@Neil - hard to imagine a sane linear search thats slower than a sane sort. Even in the best special case, where the items are already sorted and the sort algorithm handles that well, a check is still needed for every item to confirm that they're already sorted.
Steve314
very improbable: linear search starts from the end up, destroying sequentiality. sort wins by cache hits. though this still doesn't warrant the use of "probably" :)
aib
@Steve314: I don't see C or C++ in the question tags.
MAK
@MAK That's because it was edited by a nitwit - I've rolled it back. And when making comments like this you should always check the edit history first.
anon
@MAK - either I'm finally going completely insane (previously, I was only sure of "mildly insane"), or the tags have been changed. FWIW, I don't remember the search, efficiency, speed or performance-comparison tags being there - only the algorithm tag and (as I said) C and C++. Maybe the author (or someone else) realised the implementation language isn't really relevant to an algorithm issue, and edited?
Steve314
Aaaarrrggghhh!!! - everything keeps changing! It's worse than a requirements doc!
Steve314
@steve314, @Neil Butterworth: Yes, I see that the tags did actually include c++ and c when I posted my first comment, so you are right.
MAK
@aib - a linear search in an array is near perfect for cache. Possibly the most cache-friendly sort is the bubble-sort (yuck!), and quicksort is IIRC fairly good, but not as good as a linear search. Perhaps you're thinking of a linear search of a linked list? (if sort-then-binary-search is a viable option this seems unlikely, though the question doesn't specify the form of the input).
Steve314
@Steve314 I was thinking of a linear search starting from the end going to the beginning of the array - it's still linear, right? :)
aib
@aib - yes, and just as near-perfect for cache. The cache doesn't care whether you step forwards or backwards, as long as each step is normally within the same cache page. If that wasn't true, quicksort would be very bad for cache - each partition steps two pointers, one moving forwards through the array, the other backwards, both meeting at the pivot. Since forwards vs. backwards isn't an issue for the cache, the only issue is you typically have more relevant cache pages in use at any time than for linear search through an array.
Steve314
Ah right, I was thinking more of a "read the following couple of bytes" cache - not in terms of pages. Still, it was a bit tongue-in-cheek; you cannot beat a linear search.
aib
A: 

It depends on the input. Linear search is usually faster for small number of elements.

frunsi
A _single_ linear search is *always* faster than a single sort followed by a single binary search. However, there is a point at which a single sort followed by multiple binary searches will be faster than multiple linear searches.
D.Shawley
+2  A: 

Linear search is O(n), sorting is O(n log n), binary search is O(log n). Thus, if you expect to search more than log n times, sorting is better. (Of course this is just theoretical performance, on practice you're also interested in constants, esp. for not-too-large n)

Roman Cheplyaka
+2  A: 

Forgetting the binary search for a minute:

Quick sort best case: O(n log n)

Quick sort worst case: O(n^2)

Linear search worst case: n

So regardless of what you do after quick sort, you'd already have found the item if you'd done a linear search. However, this only holds for the first search. Say you had more than one search to do. If you retain the (Quick)sorted list, then all following searches have a worst case of O (log n), so over several searches (particularly on larger datasets), sorting and searching would be wise. As always with programming and software design in general, there is never a 'one size fits all', so thinking about what patterns and implementations are most appropriate to your particular domain in essential.

Noel M