views:

36

answers:

3

This query is very slow, taking about 1 second per record. Sadly, for (and because of) the size of the database, this is untenable as it will take days to complete.

Can you suggest a way to speed it up substantially? (I only need to run it once, but in a <1hr window ideally)

update participants set start_time = (select min(time_stamp)
from tasks where participant_id = participants.participant_id)

I don't think we need full table descriptions to suggest a more sensible query structure, but I can post them if required. The database is mysql.

Many thanks.

+1  A: 

You would need to make sure there is an index on tasks.participant_id. Depending on the number of tasks per participant (if there are really many) you could also add an index on time_stamp, although I don't know if MySQL would make use of it.

lassej
Boom! an index! that will be it for sure. Yes, the tasks.participant_id field is not indexed. Many thanks
Andrew
A: 

You can do it with a temporary table like this:

create temporary table temp 
select id as participant_id, min(time_stamp) as start_time 
from participants inner join tasks on participants.participant_id = tasks.participant_id 
group by participant_id;

update participants, temp 
set start_time = temp.start_time 
where participants.participant_id = temp.participant_id;

This replaces the correlated subquery with a much faster join.

Temporary tables are dropped automatically by the MySQL server when the MySQL connection to the client is closed, so depending on your application's connection handling you might want to drop it manually.

Simon
A: 

i think, you don't need an inner select

update participants set start_time = min(time_stamp)

Correction:

update participants 
set start_time = min(tasks.time_stamp)
from participants inner join 
tasks on participants.participant_id = tasks.participant_id

and with the correct foreign key and index settings it shouldn't take so long.

ibram
That query does not yield the same result as OP's. Each participant has an individual start_time.
Simon
You're correct. I didn't recognized the tasks table.
ibram