ansaurus

Question

MySQL table data transformation -- how can I dis-aggregate MySQL time data?

Answer 1

+2 A:

You can create a table containing a row for every minute from the start of your dataset to the end, and run joins against that:

select user_id, work_id, machine_id, production_minute, output
from prod_event p
join prod_minute m on p.start <= m.production_minute and m.production_minute <= p.end;

Populating the prod_minute table can be fun:

create table counter ( i int not null auto_increment primary key );
insert into counter values ( 0 );
insert into counter select NULL from counter;
# ...  repeat until your counter table contains enough minutes

create table prod_minute ( production_minute datetime not null primary key );
insert into prod_minute select date_add( '2000-01-01', interval i minute ) from counter;

Martin 2010-03-09 07:57:55

Thanks Martin! This approach would work perfectly for a single machine. An additional complexity is that we report on 219 machines, some number between 0 and 219 of which may be running simultaneously. Therefore the production_minute column could have the same value between 1 and 219 times, depending upon the number of machines running in that minute. We could add another column as the PK so as to make production_minute non-unique, but how could we repeat the same time stamp for production_minute a variable number of times based on the number of machines actually running in that minute?

lighthouse65 2010-03-09 16:00:45

If all your data is in the single prod_event table, the single join should cover you for all user_id, work_id, machine_id, and output. Give it a go on a small extract - it should just work.

Martin 2010-03-09 16:22:42

I see...will try it and post back...thanks again.

lighthouse65 2010-03-09 17:16:21

This approach looks like it will work, but I am struggling with the join. Specifically, there is a many:many relationship between the two tables, based on the join logic: when a prod_event row spans more than one minute there are multiple prod_minute rows that join to it; and when there is more than one machine in operation during any given minute, there are multiple prod_event rows that join to the prod_minute row. The prod_event table has 5 million rows and one month has 43,000 minutes in it. Any ideas (other than dramatically shrinking the data set)?

lighthouse65 2010-03-10 23:24:32

I am not sure what your question is. Are you asking for a better table design than the one shown in your question ? If so, I think you need to alter the question - or ask a new one.

Martin 2010-03-11 08:25:09

Also, some column-oriented databases - InfoBright, Vertica - would store and query your final table very efficiently. Both offer trial or Open Source versions: I think you would be surprised.

Martin 2010-03-11 08:26:54

Martin thank you very much for all of your help. Before moving beyond MySQL, our current dbms, we would want to push this analysis much further in MySQL and consider alternatives only if we could not get it to work there. In thinking through where our thread is going, the real issue now is the most efficient query design to join two tables with a many:many relationship between the join predicates. This is a different question than my original one and -- I think you are right -- I will pose it as a new question

lighthouse65 2010-03-13 17:07:31

ansaurus

tags:

views:

answers:

MySQL table data transformation -- how can I dis-aggregate MySQL time data?

related questions