Hi everyone,
I have a somewhat complicated assortment of tables for which I need to do some SQL query construction/optimization. Currently a lot of the logic being used to obtain the results we need is being done at the app layer, which is resulting in terrible performance due to full table traversals, etc. SQL is not my strong suit, so I thought I'd reach out to the SO crowd to see if anybody could lend a hand.
Infrastructure Background:
- DB is MySQL5
- We're accessing this data via Hibernate using Java
- Most of these tables' contents are relatively static, with the exception of the "salesperson-hourly-performance" table which contains a row for each hour of each day a given salesperson is active (e.g., has made or received a call) with the running tally of that salesperson's performance for the entire day. Given the # of sales people across the companies in question this table can grow by 20K+ rows per day.
Data Objects
I've created a simplified version of the table setup which incorporates the relevant data. The "real" tables have about 20 companies, 300 divisions, 20K sales people, and millions of records of salesperson performance data.
CREATE TABLE `so_test`.`company` (
`id` int(10) unsigned NOT NULL auto_increment,
`name` varchar(45) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=8 DEFAULT CHARSET=latin1;
INSERT INTO company VALUES (7, 'CompanyXX');
CREATE TABLE `so_test`.`division` (
`id` int(10) unsigned NOT NULL auto_increment,
`name` varchar(45) NOT NULL,
`campanyId` int(10) unsigned NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=18 DEFAULT CHARSET=latin1;
INSERT INTO division VALUES (17, 'APAC #1');
CREATE TABLE `so_test`.`salesperson` (
`id` int(10) unsigned NOT NULL auto_increment,
`divisionId` int(10) unsigned NOT NULL,
`name` varchar(45) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=213860 DEFAULT CHARSET=latin1;
INSERT INTO salesperson VALUES (213859, 'bob jones');
CREATE TABLE `so_test`.`salesperson_hourly_performance` (
`id` int(10) unsigned NOT NULL auto_increment,
`timestamp` DATETIME NOT NULL,
`salesPersonId` int(10) unsigned NOT NULL,
`callsInBound` int(10) unsigned NOT NULL,
`callsOutBound` int(10) unsigned NOT NULL,
`issuedOrders` int(10) unsigned NOT NULL,
`salesRevenue` decimal(10,4) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=552395 DEFAULT CHARSET=latin1;
INSERT INTO salesperson_hourly_performance VALUES (552394, '2009-05-03 22:00:00', 213859, 15, 17, 14, 10798.0478),
(551254, '2009-05-03 21:00:00', 213859, 14, 16, 13, 9802.3620),
(551115, '2009-05-03 20:00:00', 213859, 13, 14, 12, 9183.8250),
(550072, '2009-05-03 19:00:00', 213859, 11, 13, 11, 8490.8678),
(549613, '2009-05-03 18:00:00', 213859, 10, 11, 9, 7230.1125),
(549389, '2009-05-03 17:00:00', 213859, 9, 10, 8, 6486.2173),
(548861, '2009-05-03 16:00:00', 213859, 7, 9, 7, 5537.8553),
(548059, '2009-05-03 15:00:00', 213859, 6, 8, 6, 4663.8469),
(547466, '2009-05-03 14:00:00', 213859, 5, 7, 5, 4082.6388),
(546729, '2009-05-03 13:00:00', 213859, 4, 6, 4, 3057.7368),
(546611, '2009-05-03 12:00:00', 213859, 3, 5, 2, 1751.6135),
(545642, '2009-05-03 11:00:00', 213859, 2, 4, 2, 1751.6135),
(545558, '2009-05-03 10:00:00', 213859, 1, 3, 0, 0.0000),
(545072, '2009-05-03 09:00:00', 213859, 1, 2, 0, 0.0000),
(565071, '2009-05-04 13:00:00', 213859, 19, 17, 6, 4200.1710),
(575070, '2009-05-06 14:00:00', 213859, 0, 2, 1, 120.0000);
Business requirements:
- Populate a set of web-based sales performance "dashboard" UIs which provide a separate performance overview for the companies, the divisions, and the individual sales people.
- The UIs are largely similar to one-another, aside from the dataset: the "company" dashboard aggregates all the data of all the salespeople in each of the compay's divisions and outputs a row per company, whereas the divisions dashboard for a particular company aggregates the data of each of the salespeople in that division and a row per division.
The UIs allow the user to pick a date range for the report dashboard and sort by any of the columns. The columns displayed include:
(Company|Division|Sales Person) Name, Total issued orders, Total sales revenue, Total calls inbound, Total calls outbound.
My issue/plea to SO:
The "legacy" approach (which was shameful yet kinda-sorta-marginally-acceptable when the output was to a daily journal) was to programatically iterate through the performance data for each of the relevant objects (e.g., each sales person in a division in a company), find the "last" one on each of the given days in the specified date range, and sum the data. However, given the massive dataset and the need to present this data "live" in a UI, I need guidance/examples of how to construct efficient SQL queries against this dataset which will allow for pagination and sorting.
Would some kind soul please show me a reasonable query which gets the sum of each of the sales person performance data columns for a given date range (keeping in mind that for each day, the row to use for the sum is the last one by date for that day, for that salesperson).
A query which performs query #1 over a range of sales people (e.g., all the sales people in a given company) with support for pagination and ordering on a particular column?
I hope I've included sufficient details to make clear what I'm asking...please let me know if you need any additional information.
Many thanks SO SQL gods!
UPDATE:
Added missing keys from salesPerson -> division & from division -> company. Also, fixed datatype of "timestamp" to be DATETIME instead of VARCHAR.