I've read online for a while now that using indexes really speeds up your database queries.
My question is what are indexes? Why do they speed queries up?
I've read online for a while now that using indexes really speeds up your database queries.
My question is what are indexes? Why do they speed queries up?
An index is a copy of (a part of) a table, which becomes smaller to keep in memory than the whole table, thus speeding read operations.
In simple terms, it provides a way to find data efficiently.
Taking a telephone book, it's always "last name, first name" so you can lookup someone up. Imagine if the phone company just added new numbers to the end of the list without any ordering: you'd have to scan millions of entries one by one to find "Smith, John".
Well, the same applies to a database table. A table without an index (simply) is called a "heap": because your data is literally a pile of unordered data. If I have a million rows, I have to look through every single row to find what I want.
Of course, it's more complex that that but I hope this captures the essence.
The same applies anywhere: street names in an A-Z guide are always alphabetical, entries on your bank statement are always in date order
Indexes are a complex thing, some bullet points:
There are different types of indexes, and are very specific to every implementation. Basically there is no exact science to creating indexes, and they are crucial to an application.
IMO, one of the better articles on beginner index, actually set of articles. It requires an account, but free and a great SQL information resource.
Have you ever searched the contents of a book using its index? You normally see the index page to see the chapter you need is present on which page and then jump right to that page, instead of searching through all the pages.
Thats quite similar to how indexes work on the table - depending on which column(s) the query filters on, the index on that column(s) are scanned which gives the location of the corresponding rows in the actual physical memory. This is much faster than searching all the rows individually. Also the indexes are normally ordered (whereas the actual rows may not be) which allows better search algorithms like binary search scan to be applied.