views:

88

answers:

4

I have an app that needs to update a large amount of data over a large number of entries. Basically it does some 7,000 inserts and/or updates but it takes a looooong time (like almost 9 minutes... averaging about 0.08 seconds per query). Essentially I'm looking for general speed ups to making multiple such requests (I'm not expecting a specific answer to my vague example... that's just to, hopefully, help explain).

Here are some samples from profiling the requests:

SELECT `habitable_planets`.* FROM `habitable_planets` WHERE (timestamp = '2010-10-15T07:30:00-07:00') AND (planet_id = '2010_Gl_581_c')

INSERT INTO `habitable_planets` (`planet_id`, `timestamp`, `weather_air_temp`, `weather_cell_temp`, `weather_irradiance`, `weather_wind_float`, `biolumin_to_date`, `biolumin_detected`, `craft_energy_usage`, `craft_energy_consumed_to_date`) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)

SELECT `habitable_planets`.* FROM `habitable_planets` WHERE (timestamp = '2010-10-15T07:45:00-07:00') AND (planet_id = '2010_Gl_581_c')

INSERT INTO `habitable_planets` (`planet_id`, `timestamp`, `weather_air_temp`, `weather_cell_temp`, `weather_irradiance`, `weather_wind_float`, `biolumin_to_date`, `biolumin_detected`, `craft_energy_usage`, `craft_energy_consumed_to_date`) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)

Repeat ad nauseam (well, about 7,000 times). This is an update that collects data generated at intervals over a 24 hour period and then makes a massive update to the database once per day. Given the limited bit I've shown you are there any suggestions for speeding this process up?

For example... would it make sense to, rather than do a select for each timestamp, do one select for a range all at once and then iterate over them in the script?

Vaguely like:

SELECT `habitable_planets`.* FROM `habitable_planets` WHERE (planet_id = '2010_Gl_581_c')

assign that result to $foo and then do:

foreach ($foo as $bar)
{
    if ($bar['timestamp'] == $baz) // where $baz is the needed timestamp
    {
    // do the insert here
    }
}

EDIT: To add a bit to this, one thing that improved responsiveness in my situation was to change a bunch of code that checked for a record existing and either did an insert or an update depending on the result into using an INSERT... ON DUPLICATE KEY UPDATE sql query. This resulted in about a 30% speed gain in my particular case because it cut at least one trip to the database out of the equation and over thousands of requests this really adds up.

+6  A: 

Some useful links:

From MySQL Documentation:

Speed of INSERT Statements says:

  • If you are inserting many rows from the same client at the same time, use INSERT statements with multiple VALUES lists to insert several rows at a time. This is considerably faster (many times faster in some cases) than using separate single-row INSERT statements. If you are adding data to a nonempty table, you can tune the bulk_insert_buffer_size variable to make data insertion even faster.

  • If multiple clients are inserting a lot of rows, you can get higher speed by using the INSERT DELAYED statement.

  • For a MyISAM table, you can use concurrent inserts to add rows at the same time that SELECT statements are running, if there are no deleted rows in middle of the data file.

  • When loading a table from a text file, use LOAD DATA INFILE. This is usually 20 times faster than using INSERT statements.

  • With some extra work, it is possible to make LOAD DATA INFILE run even faster for a MyISAM table when the table has many indexes.

NAVEED
+1. Good compehensive answer.
sberry2A
+2  A: 

If you're not already, use prepared statements (via either mysqli, PDO, or some other DB library that supports them). Reusing the same prepared statement and simply changing the parameter values will help speed things up, since the MySQL server only has to parse the query once.

INSERTs can be batched by providing multiple sets of VALUES - a single query that inserts many rows is much faster than an equivalent number of individual queries each inserting one row.

Amber
+1  A: 

What you suggest is correct. Try reducing the amount of queries you send to the server since this will save you from the multiple communication overhead.

Of course, if the inital select ranged query returns too many data, then you may have a bottleneck on PHP.

cherouvim
+1  A: 

I'd probably use a stored procedure if possible and just call that from my application code. In addition, i'd be thinking about proper indexing and possibly adding an integer based replacement surrogate key for planet_id as a 13 byte character key isnt going to be as performant in joins and lookups as a 4 byte integer one.

(I've just read that there are estimated to be 10 million trillion habitable planets in the universe so maybe you'd better use bigint unsigned :P)

drop procedure if exists update_insert_habitable_planets;

delimiter #

create procedure update_insert_habitable_planets
(
in p_planet_id int unsigned,
in p_timestamp datetime,
)
proc_main:begin

  start transaction;

  -- bulk update / insert whatever

  commit;

end proc_main #

delimiter ;
f00