views:

77

answers:

4

I have the talbe like that:

CREATE TABLE  UserTrans (
 `id` int(10) UNSIGNED NOT NULL AUTO_INCREMENT,
  `user_id` int(10) unsigned NOT NULL,
   `transaction_id` varchar(255) NOT NULL default '0',
  `source` varchar(100) NOT NULL,
   PRIMARY KEY (`id`),
   KEY `user_id` (`user_id`)
)

with innodb engine.

The transaction_id is var because sometimes it can be aphanumeric.

the id is the primary key.

so.. here is the thing, I have over 1M records. However, there is a query to check for duplicate transaciton_id on the specified source. So, here is my query:

SELECT * 
  FROM UserTrans 
 WHERE transaction_id = '212398043' 
   AND source = 'COMPANY_A';

this query getting very slow, like 2 seconds to run now. Should I index the transaction_id and the source?
e.g. KEY join_id (transaction_id, source)

What is the drawback if i do that?

+4  A: 

The main drawback is that the new index will take up space on your disks. It will also make inserts and updates a little bit slower (but this is often negligible in most situations).

On the other hand, your query will probably run in just a few milliseconds instead of 2 seconds.

Daniel Vassallo
+1  A: 

The drawbacks to adding indices are space (since storing indexes does take up space) and insert time (since when you insert new records, they have to be added to the indices).

That said, you may not need to index both fields - just indexing one of them may be enough.

Amber
+5  A: 

Obviously the benefit is that it will improve the performance of certain queries.

The drawback is that it will take a bit of space to store the index and a bit of work for the RDBMS to maintain the index. The index is especially prone to consume space because your transaction_id is such a wide string.

You might consider whether transaction_id really needs to be up to 255 characters long, or if you could declare its max length to be something shorter.

Or you could use a prefix index to index only the first n characters:

CREATE INDEX join_id ON UserTrans (transaction_id(16), source(16));

@Daniel has a good point that you might get the same benefit and save even more space by indexing only one column. Since you're doing SELECT * you've ruled out the benefit of a covering index.

Also if you intend transaction_id to be unique, why not constrain it to be unique?

CREATE UNIQE INDEX uq_transaction_id ON UserTrans (transaction_id(16));
Bill Karwin
+1 Good idea regarding the prefix index...
Daniel Vassallo
@Bill... I think the OP could also consider using just an index on `transaction_id` alone (or a prefix index), to save space... Especially if `transaction_id` is almost unique. How would you see that?
Daniel Vassallo
Transaction Id is not unique by it own. it is unique when it is with the source. :) Also, I actually don't do the SELECT *, intead, I just return the record id. :)
seatoskyhk
@seatoskyhk: If you are just returning the `id`, you could also add the `id` field to right-side of the index, for some more speed. In this case, the database would have all the information in the index to serve the query, without having to use the actual table. This is called a covering index. Therefore you may want to try: `(transaction_id(16), source(16), id)`... This is obviously another space for speed tradeoff.
Daniel Vassallo
@Daniel Vassallo: I'd guess that if using prefix indexes, there's no benefit from a covering index. That is, the query would have to read the data rows from the table anyway, to double-check that it had matched the values.
Bill Karwin
@Bill: Yes you're right. I ignored that fact... It would have to be `(transaction_id, source, id)` to benefit from the covering index.
Daniel Vassallo
+1  A: 

I would think about diching your id column and use transaction_id as your primary key I am assuming that transaction_id is unique.

this will mean that your schema prevents you from inserting a transaction id that is already there.

this reduces the the amount of data being stored, and also reduces the number of columns needing to be indexed.

if source company and transaction_id are infact a composite key.. i would make the two columns the primary key.

your current schema allows you to put in duplicates, which is an unnecessary evil.

Bingy
Unfortunately, transaction id is unique per source, but some source don't have transaction id.. so that's why.
seatoskyhk