views:

56

answers:

2

I'm trying to use a script to process a lot of dta records, let's name it process.php, the problem is that I have a huge data set, to make the job done faster, I want to run multiple instances of this script with

/usr/bin/php process.php start_record end_record &

so I'll have them running in parallel like

/usr/bin/php process.php 0 10000 &

/usr/bin/php process.php 10000 20000 &

/usr/bin/php process.php 20000 30000 &

/usr/bin/php process.php 30000 40000 &

...

I thought this way the job can be done much faster, but after trying I didn't find it much faster, instead the speed seemed to be very close to the linear way(no concurrency). I don't know if it's because process.php is inserting record into a innodb table or what.

Any ideas.

+5  A: 

If you need to insert the rows into a database, it will make absolutely no difference. It's the database that's the bottleneck, not your PHP script. You can still only insert one row at a time, so each concurrent instance will just have to wait for each other.

Daniel Egeberg
@Daniel: Thanks. Anyway to work around this?
Shawn
Good point: first find the bottleneck; then solve it. Then find the next bottleneck...
xtofl
Using extended inserts and transactions might improve performace a bit (but not by factor of four). Also possibly partitioning the table, or actually creating four separate tables and merging them into one after your main job is done.This all assumes that you actually have four CPU cores available for your scripts.
Mchl
@Shawn: Not particularly. You can try to optimize your database. If you're doing many inserts, prepared statements might be beneficial. There is of course also the option of getting faster hardware.
Daniel Egeberg
+1  A: 

Running concurrently won't help you as the inserts themselves are the bottleneck.

If you are inserting data into a table based on the same query, there are a couple optimizations you can make. Generally, though, inserts are costly and will take time if you have a large data set.

  1. As mentioned above, use a library like PDO to utilize prepared statements.
  2. If the issue is that the block of inserts is killing performance of a related web app, you may gain from queueing the inserts into some type of script that runs a block of them at once as a single insert like here: http://www.desilva.biz/mysql/insert.html

These probably won't help massively but they may help a bit.

AvatarKava