Hi So, I want to understand more about binary searching, cause I don't really understand. Binary search requires a precondition that an array is sorted. I got that right? It seems like a method should check this precondition and throw an exception if it is not met. But, why is checking the precondition a bad idea?
It's a bad idea because checking the data is sorted takes n
steps. The whole search is about log(n)
steps.
If you're going to check, you might as well do a linear search.
Binary search assumes that input data are sorted. So here you are correct.
Now in general checking if data are sorted some time. So executing this before every search would make searching really inefficient.
More details.
Assume that 'n' is amount of your data.
Binary searching needs O(log(n))
operation in worst case to find an element. Making sure that data are sorted requires O(n)
operations.
So if we will check precondition every time for very big n
we will start spending most of time on checking precondition than doing actual search.
And it is not so hard to say when you will see such an effect. I just calculated how much time you will spend on pre-cheking vs. actual searching
- For 1 element you spend no time searching.
- For 2 elements you spend 50% on searching.
- For 5 elements you spend 46% on searching
- For 20 elements you spend 22% on searching.
- For 100 elements you spend 7% on searching.
And so on. In every case rest on time is spend on precondition check.
Yep, a binary search involves 0(log n) steps and verifying the whole sequence is sorted involves 0(n) steps. From my point of view it is good to verify it in DEBUG mode, not during RELEASE.
The whole point of a binary search is that, since the data is already sorted, you can quickly locate the information you want.
Take the phone book, which is sorted by last name.
How do you find someone in the phone book? You open it up to a page which you assume will be close to what you want, and then start flipping pages.
But you don't flip one page each time, if you missed by a lot, you flip a bunch of pages, and then finally start flipping one at a time, until finally you start looking at a single page.
This is what binary search does. Since the data is sorted, it knows it can skip a lot and do another look, and it'll focus in on the information you want.
A binary search does 1 comparison for every doubled number of items. So a 1024 element collection would require around 10 comparisons, at the most, to find your information, or at least figure out that it's not there.
If you, before running the actual binary search, does a full run-through to check if the data is sorted, you might as well just do a scan for the information. A full run-through + the binary search will require N + log2 N operations, so for 1024 elements, it would require around 1034 comparisons, whereas a simple scan for the information will on average require half, which is 512.
So if you can't guarantee that the data is sorted, you should not use binary search, as it will be outperformed by a simple scan.
Edit: I will say this though, you could add a debug-only code step to verify this, to catch bugs in the code that is supposed to prepare the data for binary search, but know that, due to what I've written above, this will make the total running time a lot higher, so depending on what you want to do with this check, you might or might not want to add it. But it should not be present in release code.
In addition to what everyone else said about running time (O(n) to check all the items, vs. O(log(n)) to run binary search.)
I think you are misunderstanding the idea of a pre-condition. Pre-conditions and post-conditions are a contract. If your precondition is true, and you run your algorithm, then your post condition will be true. If your precondition is false, then you make no guarantees on the post condition.
So basically, binary search says this: If the data you give me is already sorted, then I can tell you the position of a specific piece of data, or if it is not present, by performing approximately log(n) checks. If the data is not sorted, I make no guarantees about my answer.
The work that takes you from your pre-condition to your post-condition if your algorithm. In this case, binary search.
The original question presupposes that you are using a binary search on a collection of data. That is not always the case. Many times you are just trying to compute a number in some interval.
Suppose you are trying to calculate the optimal speed setting for a fan. For some reason you can't find a closed form expression, so you simulate the air flow at different speed settings.
Assuming that the fan can run at any speed from 0RPM to 5000RPM, you don't actually have to generate a list of possible speeds. You just find the average of the previous minimum and maximum at each step of the binary search.