I just heard that you should create an index on any column you're joining or querying on. If the criterion is this simple, why can't databases automatically create the indexes they need?
Because databases simply store and retrieve data - the database engine has no clue how you intend to retrieve that data until you actually do it, in which case it is too late to create an index. And the column you are joining on may not be suitable for an efficient index.
An RDBMS could easily self-tune and create indices as it saw fit but this would only work for simple cases with queries that do not have demanding execution plans. Most indices are created to optimize for specific purposes and these kinds of optimizations are better handled manually.
Well, they do; to some extent at least...
See SQL Server Database Engine Tuning Advisor, for instance.
However, creating optimal indexes is not as simple as you mentioned. An even simpler rule could be to create indexes on every column (which is far from optimal)!
Indexes are not free. You create indexes at the cost of storage and update performance among other things. They should be carefully thought about to be optimal.
It's a non-trivial problem to solve, and in many cases a sub-optimal automatic solution might actually make things worse. Imagine a database whose read operations were sped up by automatic index creation but whose inserts and updates got hosed as a result of the overhead of managing the index? Whether that's good or bad depends on the nature of your database and the application it's serving.
If there were a one-size-fits-all solution, databases would certainly do this already (and there are tools to suggest exactly this sort of optimization). But tuning database performance is largely an app-specific function and is best accomplished manually, at least for now.
Every index you add may increase the speed of your queries. It will decrease the speed of your updates, inserts and deletes and it will increase disk space usage.
I, for one, would rather keep the control to myself, using tools such as DB Visualizer and explain statements to provide the information I need to evaluate what should be done. I do not want a DBMS unilaterally deciding what's best.
It's far better, in my opinion, that a truly intelligent entity be making decisions re database tuning. The DBMS can suggest all it wants but the final decision should be left up to the DBAs.
What happens when the database usage patterns change for one week? Do you really want the DBMS creating indexes and destroying them a week later? That sounds like a management nightmare scenario right up alongside Skynet :-)
This is a good question. Databases could create the indexes they need based on data usage patterns, but this means that the database would be slow the first time certain queries were executed and then get faster as time goes on. For example if there is a table like this:
ID USERNAME -- --------
: then the username would be used to look up the users very often. After some time the database could see that say 50% of queries did this, in which case it could add an index on the username.
However the reason that this hasn't been implemented in great detail is simply because it is not a killer feature. Adding indexes is performed relatively few times by the DBA, and by automating this (which is a very big task) is probably just not worth it for the database vendors. Remember that every query will have to be analyzed to enable auto indexes, and also the query response time, and result set size as well, so it is non-trivial to implement.