ansaurus

Question

mySQL KEY Partitioning using three table fields (columns)

Answer 1

+2 A:

I doubt partitioning is as useful as you think. That said, there are a couple of other problems with what you're asking for (note: the entirety of this answer applies to MySQL 5; version 6 might be different):

columns used in KEY partitioning must be a part of the primary key. school_id, course_id and ssname are not part of the primary key.
more generally, every UNIQUE key (including the primary key) must include all columns in the partition¹. This means you can only partition on the intersection of the columns in the UNIQUE keys. In your example, the intersection is empty.
most partitioning schemes (other than KEY) require integer or null values. If not NULL, ssname will not be an integer value.
foreign keys and partitioning aren't supported simultaneously². This is a strong argument not to use partitioning.

Fortunately, collision free hashing is one thing you don't need to worry about, because partitioning is going to result in collisions (otherwise, you'd only have a single row in each partition). If you could ignore the above problems as well as the limitations on functions used in partitioning expressions, you could create a HASH partition with:

CREATE TABLE foobar (
    ...
) ENGINE=innodb
  PARTITION BY HASH (school_id + course_id + ORD(ssname))
  PARTITIONS 2
;

What should work is:

CREATE TABLE foobar (
    id         int UNSIGNED NOT NULL AUTO_INCREMENT,
    school_id  int UNSIGNED NOT NULL,
    course_id  int UNSIGNED NOT NULL,
    ssname     varchar(64) NOT NULL,

    /* some other fields */

    PRIMARY KEY (id, school_id, course_id),
    INDEX idx_fb_si (school_id),
    INDEX idx_fb_ci (course_id),
    CONSTRAINT UNIQUE INDEX idx_fb_scs (school_id,course_id,ssname)
) ENGINE=innodb
      PARTITION BY HASH (school_id + course_id)
      PARTITIONS 2
;

or:

CREATE TABLE foobar (
    id         int UNSIGNED NOT NULL AUTO_INCREMENT,
    school_id  int UNSIGNED NOT NULL,
    course_id  int UNSIGNED NOT NULL,
    ssname     varchar(64) NOT NULL,

    /* some other fields */

    PRIMARY KEY (id, school_id, course_id, ssname),
    INDEX idx_fb_si (school_id),
    INDEX idx_fb_ci (course_id),
    CONSTRAINT UNIQUE INDEX idx_fb_scs (school_id,course_id,ssname)
) ENGINE=innodb
      PARTITION BY KEY (school_id, course_id, ssname)
      PARTITIONS 2
;

As for the files that store tables, MySOL will create them, though it may do it when you define the table rather than when rows are inserted into it. You don't need to worry about how MySQL manages files. Remember, there are a limited number of partitions, defined when you create the table by the PARTITIONS *n* clause.

outis 2009-12-21 13:21:14

very informative answer. Thank you outis. You address all my concerns. The reason why I am thinking of implementing partitioning, is the (almost ridicilous) number of rows I will potentially be dealing with. At the last rough count, we are talking of numbers north of 150M rows for the table. If you still think partitioning will not help in this case, I would like to know your reasons. BTW, the table is already in 4NF, and all the fields are required, there is no further optimisation to be had in terms of db design.

Stick it to THE MAN 2009-12-21 13:48:04

The problem with partitioning is that it doesn't often help. Some queries will still need to consult every partition (basically, it's when you filter based on a part of the partition key, such as if you were to filter a query on `school_id` but not `course_id` under the above partitioning schemes). Indices will help much more in query optimization (both partition schemes and indices are more matters of queries than table schema). Not to say you shouldn't use partitions, but the foreign keys might be more valuable.

outis 2009-12-21 15:39:14

outis: The scenario you describe will not arise, because I will ALWAYS search using the three fields. In fact, before thinking of using partitioning, I was actually thinking of using physically separate table named info_[sid]_[cid]_ssn_records. I then realised that partitioning was a more elegant solution. If there is something I am overlooking though, I would be grateful if you could point out my oversight, before I embark down this road.

Stick it to THE MAN 2009-12-21 17:28:39

outis: Actually, I just reread your last post, where you stated "... but the foreign keys might be more valuable...". I then checked the CREATE TABLE statement you proposed, and noticed that the FKs had disappeared. Was this an oversight? - OR am I to infer that the fields used for partitioning MAY NOT be be FKs to another table?

Stick it to THE MAN 2009-12-21 18:00:29

It wasn't an oversight. See the 4th point in my answer.

outis 2009-12-21 19:40:49

As for partitioning not always helping, it also depends on what limits you place on the fields. A partition will be skipped only if a query can't possibly return any row in the partition; if it might include one row, the partition will be included in the query. Indices, however, can help limit the rows a query will scan, even to the point of skipping a scan altogether. The query optimizer can make use of index key prefixes; the unique key on `(school_id, course_id, ssname)` can help a query filtered on `school_id, course_id` but not `course_id, ssname`.

outis 2009-12-21 19:49:24

The thing to do is create a 2nd database with a partitioned version of the table(s) and test it against the non-partitioned version by timing queries run on each table. Use also the EXPLAIN statement to examine how a query will be executed: http://dev.mysql.com/doc/refman/5.1/en/using-explain.html

outis 2009-12-21 19:51:24

Lots of useful info here - thanks. FKs being mutually exclusive with Partitioning is a potential spanner in the works. I am at a crossroads since ref integrity may be compromised if I remove the FKs. At the same time, I dont much like the idea of having 150+ million rows in one table. Indices will not add much in my scenario, since ALL queries on the foobar table will at the very minimum, specify the three fields in the composite PK. My problem is simply this: storing 159+ million rows (growing by around 11 million rows a year). On the other hand, I dont want to lose the ability to enforce R.I

Stick it to THE MAN 2009-12-21 22:53:15

I dont want to lose the ability to enforce Referential Integrity (due to loss of FKs). If mySQL is capable of storing this volume of data (and growth rate) in one table, then I will simply use indexing, since as I mentioned earlier, ALL queries on this table will have as a bare minimum, the 3 fields used in the composite PK.

Stick it to THE MAN 2009-12-21 22:58:41

Take a look at clustering (http://www.mysql.com/products/database/cluster/faq.html), if you have the budget for a few more servers.

outis 2009-12-21 23:18:58

Hmm, tough choice. From what you say, indices should help the query engine skip over the records not to be included in the result set. So even though the table has a huge row count, performance will not suffer too badly (hopefully). I think I will use the indices and FKs for now, and maybe use partitioning/clustering later on

Stick it to THE MAN 2009-12-22 11:09:09

ansaurus

tags:

views:

answers:

mySQL KEY Partitioning using three table fields (columns)

related questions