views:

458

answers:

8

I have a PHP script that calls an API method that can easily return 6k+ results.

I use PEAR DB_DataObject to write each row in a foreach loop to the DB.

The above script is batch processing 20 users at a time - and although some will only have a few results from the API others will have more. Worst case is that all have 1000's of results.

The loop to call the API seems to be ok, batches of 20 every 5 minutes works fine. My only concern is 1000's of mysql INSERTs for each user (with a long pause between each user for fresh API calls)

Is there a good way to do this? Or am I doing it a good way?!

+1  A: 

Well, is your method producing more load than you can handle? If it's working, then I don't see any reason to change it offhand.

Kalium
A: 

I'm not sure if i'm reading your question correctly. You have 1 user per row? So that means you're sending batches of 20 insert statements at a time? Or are you having the database execute 1 statement at a time?

NoCarrier
This should be a comment to the question, not an answer.
Phantom Watson
Yep, i realize that. however i didn't have enough points to post a comment
NoCarrier
And now, thanks to you.. i'm even farther away from having enough points.
NoCarrier
ahh, +1 to give you a break :)
Eric Petroelje
+3  A: 

Well, the fastest way to do it would be to do one insert statement with lots of values, like this:

INSERT INTO mytable (col1, col2) VALUES ( (?,?), (?,?), (?,?), ...)

But that would probably require ditching the DB_DataObject method you are using now. You'll just have to weigh the performance benefits of doing it that way vs. the "ease of use" benefits of using DB_DataObject.

Eric Petroelje
+2  A: 

Like Kalium said, check where the bottleneck is. If it is really the database, you could try the bulk import feature some DBMS offer.

In DB2, for example, it is called LOAD. It works without SQL, but reads directly from a named pipe. It is especially designed to be fast when you need to bring a large number of new rows into the database. It can be configured to skip checks and index building, making it even faster.

Ludwig Weinzierl
A: 

Database abstraction layers usually add a pretty decent amount of overhead. I've found that, in PHP atleast, it's much easier to use a plain mysql_query for the sake of speed than it is to optimize your library of choice.

Like Eric P and weinzierl.name have said, using a multi-row insert or LOAD will give you the best direct performance.

whichdan
A: 

I have a few ideas, but you will have to verify them with testing.

If the table you are inserting to has indexes, try to make sure they are optimized for inserts.

Check out optimization options here: http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html

Consider mysqli directly, or Pear::MDB2 or PDO. I understand that Pear::DB is fairly slow, though I don't use PEAR myself, so can't verify.

Eli
A: 

MySQL LOAD DATA INFILE feature is probably the fastest way to do what you want.

You can take a look at the chapter Speed of INSERT statements on MySQL Documentation.

It talks about a lot of way to improve INSERTING in MySQL.

sebthebert
A: 

I don't think a few thousand records should put any strain on your database; even my laptop should handle it nicely. Your biggest concern might be(come) gigantic tables if you don't do any cleanup or partitioning. Avoid premature optimization on that part.

As for your method, make sure you do each user (or batch) in a separate transaction. If mysql, make sure you're using innodb to avoid unnecessary locking. If you're already using innodb/postgres/other database that supports transactions you might see a significant performance increase.

Consider using COPY (at least on postgres - unsure about mysql).

Make sure your table is properly indexed (including removing unused ones). Indexes hurt insert speed.

Remember to optimize/vacuum regularly.