Scott brings up a great point about consecutive market days. I recommend handling this with a connector table like:
CREATE TABLE `market_days` (
`market_day` MEDIUMINT(8) UNSIGNED NOT NULL AUTO_INCREMENT,
`date` DATE NOT NULL DEFAULT '0000-00-00',
PRIMARY KEY USING BTREE (`market_day`),
UNIQUE KEY USING BTREE (`date`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=0
;
As more market days elapse, just INSERT
new date
values in the table. market_day
will increment accordingly.
When inserting prices
data, lookup the LAST_INSERT_ID()
or corresponding value to a given date
for past values.
As for the prices
table itself, you can make storage, SELECT
and INSERT
operations much more efficient with a useful PRIMARY KEY
and no AUTO_INCREMENT
column. In the schema below, your PRIMARY KEY
contains intrinsically useful information and isn't just a convention to identify unique rows. Using MEDIUMINT
(3 bytes) instead of INT
(4 bytes) saves an extra byte per row and more importantly 2 bytes per row in the PRIMARY KEY
- all while still affording over 16 million possible dates and ticker symbols (each).
CREATE TABLE `prices` (
`market_day` MEDIUMINT(8) UNSIGNED NOT NULL DEFAULT '0',
`ticker_id` MEDIUMINT(8) UNSIGNED NOT NULL DEFAULT '0',
`price` decimal (7,2) NOT NULL DEFAULT '00000.00',
PRIMARY KEY USING BTREE (`market_day`,`ticker_id`),
KEY `ticker_id` USING BTREE (`ticker_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
;
In this schema each row is unique across each pair of market_day
and ticker_id
. Here ticker_id
corresponds to a list of ticker symbols in a tickers
table with a similar schema to the market_days
table:
CREATE TABLE `tickers` (
`ticker_id` MEDIUMINT(8) UNSIGNED NOT NULL AUTO_INCREMENT,
`ticker_symbol` VARCHAR(5),
`company_name` VARCHAR(50),
/* etc */
PRIMARY KEY USING BTREE (`ticker_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=0
;
This yields a similar query to others proposed, but with two important differences: 1) There's no functional transformation on the date column, which destroys MySQL's ability to use keys on the join; in the query below MySQL will use part of the PRIMARY KEY
to join on market_day
. 2) MySQL can only use one key per JOIN
or WHERE
clause. In this query MySQL will use the full width of the PRIMARY KEY
(market_day
and ticker_id
) whereas in the previous query it could only use one (MySQL will usually pick the more selective of the two).
SELECT
`market_days`.`date`,
`tickers`.`ticker_symbol`,
`yesterday`.`price` AS `close_yesterday`,
`today`.`price` AS `close_today`,
(`today`.`price` - `yesterday`.`price`) / (`yesterday`.`price`) AS `pct_change`
FROM
`prices` AS `today`
LEFT JOIN
`prices` AS `yesterday`
ON /* uses PRIMARY KEY */
`yesterday`.`market_day` = `today`.`market_day` - 1 /* this will join NULL for `today`.`market_day` = 0 */
AND
`yesterday`.`ticker_id` = `today`.`ticker_id`
INNER JOIN
`market_days` /* uses first 3 bytes of PRIMARY KEY */
ON
`market_days`.`market_day` = `today`.`market_day`
INNER JOIN
`tickers` /* uses KEY (`ticker_id`) */
ON
`tickers`.`ticker_id` = `today`.`ticker_id`
WHERE
`today`.`price` > 0
AND
`yesterday`.`price` > 0
;
A finer point is the need to also join against tickers
and market_days
in order to display the actual ticker_symbol
and date
, but these operations are very fast since they make use of keys.