tags:

views:

50

answers:

2

We have one of our tables in our database that is starting to be pretty big : 10M rows 2.14G for data
3.55G for indices

I was pretty surprised to see that the indices are almost twice as big as the data itself :/

So I showed the indices :

show index from entries;
+---------+------------+----------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table   | Non_unique | Key_name                               | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+---------+------------+----------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| entries |          0 | PRIMARY                                |            1 | id          | A         |    13538389 |     NULL | NULL   |      | BTREE      |         | 
| entries |          0 | index_entries_on_link_and_feed_id      |            1 | link        | A         |    13538389 |     NULL | NULL   | YES  | BTREE      |         | 
| entries |          0 | index_entries_on_link_and_feed_id      |            2 | feed_id     | A         |    13538389 |     NULL | NULL   | YES  | BTREE      |         | 
| entries |          0 | index_entries_on_unique_id_and_feed_id |            1 | unique_id   | A         |    13538389 |     NULL | NULL   | YES  | BTREE      |         | 
| entries |          0 | index_entries_on_unique_id_and_feed_id |            2 | feed_id     | A         |    13538389 |     NULL | NULL   | YES  | BTREE      |         | 
| entries |          1 | index_entries_on_feed_id               |            1 | feed_id     | A         |       81556 |     NULL | NULL   | YES  | BTREE      |         | 
| entries |          1 | index_entries_on_time                  |            1 | time        | A         |      967027 |     NULL | NULL   | YES  | BTREE      |         | 
| entries |          1 | index_entries_on_created_at            |            1 | created_at  | A         |      846149 |     NULL | NULL   | YES  | BTREE      |         | 
+---------+------------+----------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
8 rows in set (1.35 sec)

As far as I can tell our code uses all the indices, yet some information might be duplicated : I am thinking that the index index_entries_on_feed_id might be a duplicate since both index_entries_on_link_and_feed_id and index_entries_on_unique_id_and_feed_id use it.

Am I right?

A: 

Hi,

you can use EXPLAIN followed by your SQL queries to get info about used indices. If some indices are not used, you can drop them.

Also, your indices: index_entries_on_link_and_feed_id index_entries_on_unique_id_and_feed_id

are the same, even their size is the same, so you can delete them...

oneee
Well, no, they are different indices! And they're used by different queries :/ and yes, we used EXPLAIN and all the index are used... which doesn't mean that if we delete one, the query solver will not try to use -successfully- another one :/
Julien Genestoux
+2  A: 

A few observations:

If unique_id is really unique, then I'd check carefully to see if feed_id is really necessary: even if it's for a one-field lookup, the gain in performance is very small.

What's the difference between id (primary) and unique_id?

It's quite possible for indices to use more space than the data if you are indexing a relatively small row several different ways.

10M rows isn't really very large, unless you're scanning the whole thing, in which case it's way too big. Providing your queries are making appropriate use of the indexes, it shouldn't really matter for another 100M rows or more.

If you're writing any moderately complex queries, involving joining 2 or 3 tables, I'd strongly recommend using EXPLAIN to check the query plan - I've had some surprising improvements from hand-tuning complex MySQL queries.

Mike Woodhouse
Thanks Mike. The unique_id is actually 'external data', and so the name might be misleading. In practice, we found that it's not really unique in our DB, but unique in the context of the external data. (Think to it as the <id> element if ATOM feed entries for example).Ok, then, I guess our situation is fine. We don't have a real performance impact so far... it's just that I wanted to be sure we're not "over-indexing", for future scalability, since this DB has a lot of writes, and quite few selects (80%/20%), and writes are where the number of index matters.
Julien Genestoux