views:

338

answers:

8

I have a field in a database that is nearly unique: 98% of the time the values will be unique, but it may have a few duplicates. I won't be doing many searches on this field; say twice a month. The table currently has ~5000 records and will gain about 150 per month.

Should this field have an index?

I am using MySQL.

A: 

From what you say, it doesn't sound like an index is necessary. Rule of thumb is index fields that are being used in SELECTS a lot to speed up the searching, which in turn (can) slows down INSERTS and UPDATES.

On a recordset as small as yours, I don't think you will see much of a real world hit either way.

KiwiBastard
+1  A: 

That's not very many records at all; I wouldn't bother making any indexes on that table. The relative uniqueness of the field is irrelevant - even on years-old commodity hardware I'd expect a query on that table to take a fraction of a second.

Eli Courtwright
+1  A: 

you can use the general rule of thumb: optimize when it becomes a problem. Just don't use an index until you notice you need one.

Arthur Thomas
+4  A: 

I think the 'nearly unique' is probably a red herring. The data is either unique, or it's not, but that doesn't determine whether you would want to index it for performance reasons.

Answer:

5000 records is really not many at all, and regardless of whether you have an index, searches will still be fast. At that rate of inserts, it'll take you 3 years to get to 10000 records, which is still also not many.
I personally wouldn't bother with adding an index, but it wouldn't matter if you did.

Explanation:

What you have to think about when deciding to add an index is the trade-off between insertion speed, and selection speed.

Without an index, doing a select on that field means MySQL has to walk over every single row and read every single field. Adding an index prevents this.

The downside of the index is that each time data gets inserted, the DB has to update the index in addition to adding the data. This is usually a small overhead, but you'd really notice it if you had loads of indexes, and were doing a lot of writes.

By the time you get this many rows in your database, you'd want an index anyway as otherwise your selects would take all day, but it's just something to be aware about so that you don't end up adding indexes on fields "just in case I need it"

Orion Edwards
If the row size was small and the total size of the 5000 rows in the table was less than approx. 8 pages, then even if you created an index it probably would not be used by the optimiser. The SQL server ruleofthumb (with caveats) is that a column must have a selectivity of > 10% for it to be useful
Mitch Wheat
A: 

If you'll only be doing searches on it twice a month and its that few rows then I would say don't index it. Its all but useless.

A: 

It's really a judgement call. With such a small table you can search reasonably quickly without an index, so you could get by without it.

On the other hand, the cost of creating an index you don't really need is pretty low, so you're not saving yourself much by not doing it.

Also, if you do create the index, you're covered for the future if you suddenly start getting 1000 new records/week. Possibly you know enough about the situation to say for certain that that will never happen, but requirements do have a way of changing when you least expect.

EDIT: As far as changing requirements, the thing to consider is this: If the DB does grow and you find out later that you do need an index, can you simply create the index and be done? Or will you also need to change lots of code to make use of the new index?

Clayton
A: 

No. There aren't many records and it's not going to be frequently queried. No need to index.

Kon
A: 

It depends. As others have responded, there's a trade off between table update speed and selection speed. Table update includes inserts, updates, and deletes on the table.

One question you didn't address. Does the table have a primary key, and a corresponding index? A table with no indexes usually benefits form having at least one index. The most common way of getting that index is to declare a primary key, and rely on the DBMS to generate an index accordingly.

If a table has no candidates for primary key, that usually indicates a serious flaw in table design. That's a separate issue and should get a spearate discussion.

Walter Mitty