How would you explain to some one how indexes improve the performance of the database when used judiciously ? I am looking for a good, clear explanation since it's too complex out there in the books.
I would use the classic explanation:
An index provides an orderly and speedy method to traverse a data set.
The classic example is the phone book - it is INDEXED by name alphabetically - this speeds up your access as you can get directly to the name you want to find using the indexing method.
The database is really no different - you are not having to scan the entire table for your employee with EmployeeId = 123. You are simply scanning the index, stored in a known order, which ends up performing much better.
A database index helps the database lookup records quickly in a database table, a lot like an index helps you look up information in a book quickly. I think the key is to index the right data, so that the database can quickly lookup the data that makes the most sense. With the book example, you wouldn't index every single word, you would just index key words that a user is most likely going to want to lookup.
If the key word in your question is "Judiciously" then it is important to mention that where the benefit of an index is speedy queries, the trade-offs are speed and size.
Just like in the phone book, it takes a bit of extra time to maintain the index, and also a bit of space for the index itself. Whenever a record is added to or removed from the database some time has to be spent to update the index.
So going overkill with indexes on a database with a high rate of inserts etc might not be considered judicious use. However, thoughtfully using indexes to help speed up queries can be a massive benefit to performance.
Stephane Faroult, SQL Indexing in 9 Minutes and a Half [YouTube video].
When searching for data, if the fields used to search on are indexed, you get a direct reference to the data(or range of data, but leave that out for the non-technical talk).
If they are not indexed, then it has to search the whole table doing the comparison for a match.
Bear with me, this will take awhile :-).
Think of a simple address book where you just add records on the end as new friends or colleagues arrive (the next entry would go at 5):
1. Bob Smith, 7 Station St, Wotahole, NJ
2. Greg Jones, 3 Railway Pde, Boot Hill, KA
3. Allan Brown, 27 Carriage Court, Washington, DC (home)
4. Allan Brown, 1066 Hastings Street, Washington, DC (work)
5.
Now you need to find someone's address. No trouble, I hear you say, just scan the list looking for the name, then read off the address.
Now, what if you're so stunningly popular that you have 1,024 friends like me (I'm such a geek, I only allocate friends in powers of two - I actually have 2,024 but 1,000 of them are being held in limbo unitl I can get another 24 together :-).
In order to find a particular friend, you'd have to scan, on average, 512 entries (half of those in use). That's quite tedious. The worst case is scanning all 1,024 of them to find the last person you added.
Now let's add that index. Every time you add a new friend/colleague (or delete them if they cause you too much trouble), you update this index which stores just the name in sorted order along with the line number of the full entry (the index pages in your address book are magic and auto-sort everything you write in them).
The index for out mini-list above would be:
1. Allan Brown, 3
2. Allan Brown, 4
3. Greg Jones, 2
4. Bob Smith, 1
The names and line numbers take up less space than the full entries but the most important aspect is this.
In order to find an entry, you only have to scan, worst case, 10 entries (log21024). First, you check index number 512. If the name you're looking for is greater than that, you only have to look at entries 513-1024. If it's less, you're now only interested in entries 1-511. In either case, you've immediately cut down your search space by half.
With the original method, you can only discard the one you check since you don't have the ordering information available to you.
So the size of the search space goes like this (I've actually used powers of two for the indexed method but it's slightly better than that):
+-----------+----------------+------------+
| Iteration | Indexed method | Old method |
+-----------+----------------+------------+
| 0 | 1024 | 1024 |
| 1 | 512 | 1023 |
| 2 | 256 | 1022 |
| 3 | 128 | 1021 |
| 4 | 64 | 1020 |
| 5 | 32 | 1019 |
| 6 | 16 | 1018 |
| 7 | 8 | 1017 |
| 8 | 4 | 1016 |
| 9 | 2 | 1015 |
| 10 | 1 | 1014 |
+-----------+----------------+------------+
Once you've found the index, extract the line number from it and, because you know you've got 16 entries per page, entry number 275 (for example) is on page 18, line 4. You can go straight there without further searching.
So, at the cost of a little more storage space and some time maintaining the index, you've greatly increased the speed of your searches. And that's what indexes do in databases as well.