views:

126

answers:

2

Hello, please I have the same problem as I found here

http://stackoverflow.com/questions/409705/mysql-selecting-data-from-multiple-tables-all-with-same-structure-but-different ,

I have to select data from many MySQL tables with identical structure, but different data (split up into table_0, table_1, table_2 etc to table_5 to distribute millions of records of data).

The hardware generating the data records for each device moves from table to table according to timestamp field, which is NOT unique. e.g. 50 records in table_0 may have the same timestamp. When the data gets to the end of table_5, it goes back to table_0 to start overwriting the data there. I need to get the data on each device within a time range.

Each table's data columns (for table_0, table_1... up to table_5):
timestamp, robotGroupID, robotID, sensor1, sensor2, sensor3, ... (many of them)

However the tables are HUGE and the UNION ALL (I read its faster than DISTINCT) takes forever to execute, even with just two tables let alone 6. e.g. I will illustrate for two tables below.

MySQL statement in PHP: (illustrated for just sensor 1, sensor 2 and sensor 3)
(SELECT sensor1, sensor2, sensor3 FROM table_0 WHERE robotID=".$robotID." AND timestamp BETWEEN ".$timeStampStart." AND ".$timeStampStop)
UNION ALL
(SELECT sensor1, sensor2, sensor3 FROM table_1 WHERE robotID=".$robotID." AND timestamp BETWEEN ".$timeStampStart." AND ".$timeStampStop)

N.B it is the exact same query except for the table name. Sensor data for a robot within a time range may span none, one, or more of the tables at once.

I cannot use LIMIT because the number of reports from robots within each time range cannot be known ahead of time. I cant use the MERGE STORAGE ENGINE cos I only have read-only access to the company's database.

I have an idea to use count(robotID) or so on each table to check before running queries but Im not sure how to go about this cos I'm quite a novice.

Please how do you think I can make this work faster for 6 tables and many more columns since there are many more columns than illustrated? Thanks in advance!

A: 

I must confess Im still a novice PHP/MySQL coder, but with many ideas; so my code is probably "dirty".

So I solved the problem this way in order to move forward, but please better solutions are welcome. As for any strange syntax, I am using a database class built upon the PHP PDO because I am using many different RBDMS types on this project.

For the $myQuery_start variable, I added the names of the other columns as well as sensors 1 to 3.
$myQuery_start = "(SELECT sensor1, sensor2, sensor3 FROM ";
$myQueryCount_start = "(SELECT COUNT(*) FROM ";
$myQuery_stop = " WHERE robotID=".$robotID." AND timestamp BETWEEN ".$timeStampStart." AND ".$timeStampStop.")";

$count_0 = DB::getDB("mysql", $myDB)->query($myQueryCount_start."table_0".$myQuery_stop)->fetchColumn();
$count_1 = DB::getDB("mysql", $myDB)->query($myQueryCount_start."table_1".$myQuery_stop)->fetchColumn();
$count_2 = DB::getDB("mysql", $myDB)->query($myQueryCount_start."table_2".$myQuery_stop)->fetchColumn();
$count_3 = DB::getDB("mysql", $myDB)->query($myQueryCount_start."table_3".$myQuery_stop)->fetchColumn();
$count_4 = DB::getDB("mysql", $myDB)->query($myQueryCount_start."table_4".$myQuery_stop)->fetchColumn();
$count_5 = DB::getDB("mysql", $myDB)->query($myQueryCount_start."table_5".$myQuery_stop)->fetchColumn();

And now I check to see if UNION ALL needs to be appended to each table's query or not. No need to have a UNION ALL if there is no data record to attach in the next table.
$union_0 = (($count_1 + $count_2 + $count_3 + $count_4 + $count_5) > 0)?" UNION ALL ":"";
$union_1 = (($count_2 + $count_3 + $count_4 + $count_5) > 0)?" UNION ALL ":"";
$union_2 = (($count_3 + $count_4 + $count_5) > 0)?" UNION ALL ":"";
$union_3 = (($count_4 + $count_5) > 0)?" UNION ALL ":"";
$union_4 = (($count_5) > 0)?" UNION ALL ":"";

and now we build up the table queries and combine to form the full query

$query_0 = ($count_0 > 0)?$myQuery_start."ip_minute_stats_0".$myQuery_stop.$union_0:"";
$query_1 = ($count_1 > 0)?$myQuery_start."ip_minute_stats_1".$myQuery_stop.$union_1:"";
$query_2 = ($count_2 > 0)?$myQuery_start."ip_minute_stats_2".$myQuery_stop.$union_2:"";
$query_3 = ($count_3 > 0)?$myQuery_start."ip_minute_stats_3".$myQuery_stop.$union_3:"";
$query_4 = ($count_4 > 0)?$myQuery_start."ip_minute_stats_4".$myQuery_stop.$union_4:"";
$query_5 = ($count_5 > 0)?$myQuery_start."ip_minute_stats_5".$myQuery_stop:"";

Then concatenated:
$myQuery = $query_0.$query_1.$query_2.$query_3.$query_4.$query_5;
And finally $myQuery is executed to produce all the data as required.

At least this is roughly 8 times faster than the previous way I used UNION ALL, so I think this is valid. Any suggested further optimization?

Ticabo
Please, more answers (or edits on the above answer) are welcome. My Boss still thinks this runs too slowly :(.I want to know I have tried my best before I move on. Thanks.
Ticabo
+1  A: 

Are the fields RobotID and Timestamp indexed?

I would add a multi-field index of ( RobotId, timestamp ) at the very least.

You say you have read only access to the tables, so can you request this index to be added? I'm sure it will help in both your original and updated queries posted.

Dave Rix
Thanks a lot Dave Rix. But, no, many companies use this same database so my company's request was flatly dismissed. I wish I could add such an index! Thanks :)
Ticabo