views:

516

answers:

1

I'm having a table looking something like this:

CREATE TABLE `Calls` (
  `calendar_id` int(11) NOT NULL,
  `db_date` timestamp NOT NULL,
  `cgn` varchar(32) DEFAULT NULL,
  `cpn` varchar(32) DEFAULT NULL,
  PRIMARY KEY (`calendar_id`),
  KEY `db_date_idx` (`db_date`)
) 
 PARTITION BY RANGE (calendar_id)(
   PARTITION p20091024 VALUES LESS THAN (20091024) ,
   PARTITION p20091025 VALUES LESS THAN (20091025));

Can I somehow use the mysql scheduler to automatically add a new partition(2 days in advance) - I'm looking for an example that would, every day add a new partition - it'd run something like

alter table Calls add partition (partition p20091026 values less than(20091026));

Where p20091026/20091026 is constructed when the scheduled task run, deriving the value from now + 2 day. (Or am I better of scripting this through cron ?)

+4  A: 

Yes, you can do this.

Note that the scheduler isn't active by default (see Event Scheduler Configuration), so it's not a zero-risk option. For example, if your operations team migrates your app to a new server, but forgets to enable the scheduler, your app will get hosed. There's also special privileges needed, which again may need to be set up on a new server.

My advice: first, create a stored procedure (see code sample below) which handles periodic partition maintenance: dropping old partitions if the table gets too big, and adding enough new partitions (e.g. 1 week) so that even if the maintenance proc isn't run for a while, your app won't die.

Then redundantly schedule calls to that stored proc. Use the MySQL scheduler, use a cron job, and use any other way you like. Then if one scheduler isn't working, the other can pick up the slack. If you design the sproc correctly, it should be cheap to execute a no-op if it doesn't need to do anything. You might even want to call it from your app, e.g. as the first statement when generating a long-running report, or as part of your daily ETL process (if you have one). My point is that the achilles heel of scheduled tasks is ensuring that the scheduler is actually working-- so think about redundancy here.

Just make sure not to schedule all the calls at the same time so they won't step on each other! :-)

Here's a code sample for what your maintenance proc could look like-- first it prunes old partitions, then adds new ones. I left error checking and preventing multiple simultaneous executions as an exerise for the reader.

DELIMITER $$

DROP PROCEDURE IF EXISTS `test`.`UpdatePartitions` $$
CREATE PROCEDURE `test`.`UpdatePartitions` ()
BEGIN

  DECLARE maxpart_date date;
  DECLARE partition_count int;
  DECLARE minpart date;
  DECLARE droppart_sql date;
  DECLARE newpart_date date;
  DECLARE newpart_sql varchar(500);

  SELECT COUNT(*)
    INTO partition_count
    FROM INFORMATION_SCHEMA.PARTITIONS
    WHERE TABLE_NAME='Calls' AND TABLE_SCHEMA='test';

  -- first, deal with pruning old partitions
  -- TODO: set your desired # of partitions below, or make it parameterizable
  WHILE (partition_count > 1000)
  DO

    -- optionally, do something here to deal with the parition you're dropping, e.g.
    -- copy the data into an archive table

     SELECT MIN(PARTITION_DESCRIPTION)
       INTO minpart
       FROM INFORMATION_SCHEMA.PARTITIONS
       WHERE TABLE_NAME='Calls' AND TABLE_SCHEMA='test';

     SET @sql := CONCAT('ALTER TABLE Calls DROP PARTITION p'
                        , CAST((minpart+0) as char(8))
                        , ';');

     PREPARE stmt FROM @sql;
     EXECUTE stmt;
     DEALLOCATE PREPARE stmt;

    SELECT COUNT(*)
      INTO partition_count
      FROM INFORMATION_SCHEMA.PARTITIONS
      WHERE TABLE_NAME='Calls' AND TABLE_SCHEMA='test';


  END WHILE;

  SELECT MAX(PARTITION_DESCRIPTION)
    INTO maxpart_date
    FROM INFORMATION_SCHEMA.PARTITIONS
    WHERE TABLE_NAME='Calls' AND TABLE_SCHEMA='test';

  -- create enough partitions for at least the next week
  WHILE (maxpart_date < CURDATE() + INTERVAL 7 DAY)
  DO

    SET newpart_date := maxpart_date + INTERVAL 1 DAY;
    SET @sql := CONCAT('ALTER TABLE Calls ADD PARTITION (PARTITION p'
                        , CAST((newpart_date+0) as char(8))
                        , ' values less than('
                        , CAST((newpart_date+0) as char(8))
                        , '));');

    PREPARE stmt FROM @sql;
    EXECUTE stmt;
    DEALLOCATE PREPARE stmt;

    SELECT MAX(PARTITION_DESCRIPTION)
      INTO maxpart_date
      FROM INFORMATION_SCHEMA.PARTITIONS
      WHERE TABLE_NAME='Calls' AND TABLE_SCHEMA='test';

  END WHILE;

END $$

DELIMITER ;

BTW, partition maintenance (ensuring new partitions are created in advance, pruning old partitions, etc.) is, IMHO, critically important to automate. I've personally seen a large enterprise data warehouse go down for a day because a year's worth of partitions was cretaed initially but no one remembered to create more partitions once the next year came around. So it's very good you're thinking about automation here-- it bodes well for the project you're working on. :-)

Justin Grant
When altering the table, why arent you defining what partition to modify or im missing something. For Example how does it know that is adding the partition to `calender_Id` or is it that you can only have one type of partition and because the partition is already created, it defaults to `calender_id`
Shahmir Javaid
@shahmir - the code above isn't modifying partitions, it's dropping an old partition and adding a new one. there's only one partition scheme per table. the original poster's question shows that the partitioning happens on calendar_id.
Justin Grant