It's easy to make a big deal out of entropy. To my mind it is a pretty simple and useful concept.
Basically it quantifies what, on average, you will learn from an event, like flipping a coin, taking a branch instruction, or indexing an array.
Like a comparison operation in the middle of a search algorithm has a certain probability P of taking one branch, and 1-P of taking the other.
Suppose P is 1/2, as it is in a binary search. Then if you take that branch, you know 1 bit more than you did before, because log(2/1), base 2, is 1. On the other hand, if you take the other branch you also learn 1 bit.
To get the average amount of information you will learn, multiply what you learn on the first branch times the probability you take that branch, plus what you learn on the second branch times the probability of that branch.
1/2 times 1 bit, plus 1/2 times 1 bit, is 1/2 bit plus 1/2 bit, or total 1 bit of entropy. That's what you can expect to learn on average from that decision.
On the other hand, suppose you are doing linear search in a table of 1024 entries.
On the first == test, the probability of YES is 1/1024, so the entropy of YES at that decision is
1/1024 times log(1024/1)
or 1/1024 * 10 = about 1/100 bit.
So if the answer is YES, you learn 10 bits, but the chance of that is about 1 in a thousand.
On the other hand, NO is much more likely. It's entropy is
1023/1024 * log(1024/1023)
or roughly 1 times roughly zero = about zero.
Add the two together, and on average you will learn about 1/100 of a bit on that decision.
That's why linear search is slow. The entropy (how much you can expect to learn) at each decision is too small, since you're going to have to learn 10 bits to find the entry in the table.