We currently have a table setup to keep track of cache file names or memcached key references. The table exists so that when a comment is added or more content is added the system automatically purges those cache files/key-value pairs that are affected by the update so the rebuild the next time a user accesses the page the user sees the updated content. The current table structure is below:
CREATE TABLE IF NOT EXISTS `cache_references` (
`cache_id` int(21) NOT NULL AUTO_INCREMENT,
`cache_page_id` varchar(255) NOT NULL,
`cache_tag` varchar(100) NOT NULL,
`cache_group` varchar(100) NOT NULL,
`cache_expiry` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (`cache_id`),
KEY `cache_page_id` (`cache_page_id`),
KEY `cache_tag` (`cache_tag`),
KEY `cache_group` (`cache_group`),
KEY `cache_expiry` (`cache_expiry`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=1;
This is all well and good, however this table gets massive, very very quickly and a moderate sized site, aggressively cached can easily produce over 500,000 cache file references. Whilst this isn't really a problem for small to moderate sites this isn't really the solution for large sites with many pages and assets as it can cause table locks and/or large table scans when searching for the cache page ids to delete.
My question is what would be a possible way to improve on this solution?