tags:

views:

266

answers:

7

I'm writing an app with a MySQL table that indexes 3 columns. I'm concerned that after the table reaches a significant amount of records, the time to save a new record will be slow. Please inform how best to approach the indexing of columns.

UPDATE

I am indexing a point_value, the user_id, and an event_id, all required for client-facing purposes. For an instance such as scoring baseball runs by player id and game id. What would be the cost of inserting about 200 new records a day, after the table holds records for two seasons, say 72,000 runs, and after 5 seasons, maybe a quarter million records? Only for illustration, but I'm expecting to insert between 25 and 200 records a day.

+5  A: 

Index what seems the most logical (that should hopefully be obvious, for example, a customer ID column in the CUSTOMERS table).

Then run your application and collect statistics periodically to see how the database is performing. RUNSTATS on DB2 (my DBMS of choice) is one example, I would hope MySQL has a similar tool.

When you find some oft-run queries doing full table scans (or taking too long for other reasons), then, and only then, should you add more indexes. It does little good to optimise a once-a-month-run-at-midnight query so it can finish at 12:05 instead of 12:07. However, it's a huge improvement to reduce a customer-facing query from 5 seconds down to 2 seconds (that's still too slow, customer-facing queries should be sub-second if possible).

More indexes tend to slow down inserts and speed up queries. So it's always a balancing act. That's why you only add indexes in specific response to a problem. Anything else is premature optimization and should be avoided.

In addition, revisit the indexes you already have periodically to see if they're still needed. It may be that the queries that caused you to add those indexes are no longer run often enough to warrant it.

To be honest, I don't believe indexing three columns on a table will cause you to suffer unless you plan on storing really huge numbers of rows :-) - indexing is pretty efficient.


After your edit which states:

I am indexing a point_value, the user_id, and an event_id, all required for client-facing purposes. For an instance such as scoring baseball runs by player id and game id. What would be the cost of inserting about 200 new records a day, after the table holds records for two seasons, say 72,000 runs, and after 5 seasons, maybe a quarter million records? Only for illustration, but I'm expecting to insert between 25 and 200 records a day.

My response is that 200 records a day is an extremely small value for a database, you definitely won't have anything to worry about with those three indexes.

Just this week, I imported a days worth of transactions into one of our database tables at work and it contained 2.1 million records (we get at least one transaction per second across the entire day from 25 separate machines). And it has four separate composite keys which is somewhat more intensive than your three individual keys.

Now granted, that's on a DB2 database but I can't imagine IBM are so much better than the MySQL people that MySQL can only handle less than 0.01% of the DB2 load.

paxdiablo
A: 

The index is there to speed retrieval of data, so the question should be "What data do I need to access quickly?". Without the index, some queries will do a full table scan (go through every row in the table) in order to find the data that you want. With a significant amount of records this will be a slow and expensive operation. If it is for a report that you run once a month then maybe thats okay; if it is for frequently accessed data then you will need the index to give your users a better experience.

If you find the speed of the insert operations are slow because of the index then this is a problem you can solve at the hardware level by throwing more CPUs, RAM and better hard drive technology at the problem.

Chris Latta
+1  A: 

Without some more details about expected usage of the data in your table worrying about indexes slowing you down smells a lot like premature optimization that should be avoided.

If you are really concerned about it, then setup a test database and simulate performance in the worst case scenarios. A test proving that is or is not a problem will probably be much more useful then trying to guess and worry about what may happen. If there is a problem you will be able to use your test setup to try different methods to fix the issue.

Zoredache
+2  A: 

Nothing for select queries, though updates and especially inserts will be order of magnitudes slower - which you won't really notice before you start inserting a LOT of rows at the same time...

In fact at a previous employer (single user, desktop system) we actually DROPPED indexes before starting our "import routine" - which would first delete all records before inserting a huge number of records into the same table...

Then when we were finished with the insertion job we would re-create the indexes...

We would save 90% of the time for this operation by dropping the indexes before starting the operation and re-creating the indexes afterwards...

This was a Sybase database, but the same numbers apply for any database...

So be careful with indexes, they're FAR from "free"...

Thomas Hansen
Did you mean "inserting a LOT of rows"?
paxdiablo
@Pax, yes - *rows* obviously ;)
Thomas Hansen
I don't know Sybase at all, but couldn't you just have deactivated the trigger during one transaction and enable it afterwards instead of deleting and creating?
Cassy
A: 

What Pax said.

For the dimensions you describe, the only significant concern I can imagine is "What is the cost of failing to index multiple db columns?"

le dorfier
A: 

Only for illustration, but I'm expecting to insert between 25 and 200 records a day.

With that kind of insertion rate, the cost of indexing an extra column will be negligible.

Stephen C
+1  A: 

I made some simple tests using my real project and real MySql database.

My results are: adding average index (1-3 columns in an index) to a table - makes inserts slower by 2.1%. So, if you add 20 indexes, your inserts will be slower by 40-50%. But your selects will be 10-100 times faster.

So is it ok to add many indexes? - It depends :) I gave you my results - You decide!

nightcoder