views:

237

answers:

4

I have a process that imports a lot of data (950k rows) using inserts that insert 500 rows at a time. The process generally takes about 12 hours, which isn't too bad. Normally doing a query on the table is pretty quick (under 1 second) as I've put (what I think to be) the proper indexes in place. The problem I'm having is trying to run a query when the import process is running. It is making the query take almost 2 minutes! What can I do to make these two things not compete for resources (or whatever)? I've looked into "insert delayed" but not sure I want to change the table to MyISAM.

Thanks for any help!

A: 

You can try import your data to some auxiliary table and then merge it into the main table. You don't lose performance in your main table, and I think your db can manage the merge much faster than the multiple insertions.

darlinton
I ended up doing this and it has helped to speed the import, but the problem with searching during the import still remained. Thanks for the help!
siannopollo
try to find the real bottleneck. i think it is the hard disk - so there's nothing but create a RAID scheme or something similar to solve it. when doing the import check the system status (CPU-Memory-Disk activity) in the SO and DBMS.
darlinton
+1  A: 

Have you tried using priority hints?

SELECT HIGH_PRIORITY ... and INSERT LOW_PRIORITY ...

Ben S
i bet the biggest problems he has are not proper indexes, not the priority.
zerkms
He stated that has proper indexing and that the performance issue is only when performing bulk inserts.
Ben S
everybody lies (c)
zerkms
ha! nice one zerkms :-)
siannopollo
I'll give the priority a shot and see what happens.
siannopollo
I was able to use the "insert low_priority" (which didn't fix the problem), but "select high_priority" just threw an error for my MYSQL version, 5.1.39.
siannopollo
+1  A: 

12 hours to insert 950k rows is pretty heavy duty. How big are these rows? What kind of indexes are on them? Even if the actual data insertion goes quickly, the continual updating of the indexes will definitely cause performance degradation for anything using those table(s) at the time.

Are you doing these imports with the bulk INSERT syntax (insert into tab (x) values (a), (b), (c), etc...) or one INSERT per row? Doing the bulk insert will require a longer index updating period (as it has to generate index data for 500 rows) than doing it for a single row. There will be no doubt be some sort of internal lock placed on the indexes while the data's updated, in which case you're contending with 950k/500 = 1,900 locking sessions at minimum.

I found that on some of my bulk-insert scripts (an http log analyzer for some custom data mining), it was quicker to DISABLE indexes on the relevant tables, then reenabling/rebuilding them after the data dump was completed. If I remember right, it was about 37 minutes to insert 200,000 rows of hit data with keys enabled, and about 3 minutes with no indexing.

Marc B
A: 

So I finally found the slowdown while searching during the import of my data. I had one query like this:

SELECT * FROM `properties` WHERE (state like 'Florida%') and (county like 'Hillsborough%') ORDER BY created_at desc LIMIT 0, 50

and when I ran an EXPLAIN on it, I found out it was scanning around 215,000 rows (even with proper indexes on state and county in place). I then ran an EXPLAIN on the following query:

SELECT * FROM `properties` WHERE (state = 'Florida') and (county = 'Hillsborough') ORDER BY created_at desc LIMIT 0, 50

and saw that it only had to scan 500 rows. Considering that the actual result set was something like 350, I think I identified the slowdown.

I've made the switch to not using "like" in my queries and am very happy with the snappier results.

Thanks to everyone for your help and suggestions. They are much appreciated!

siannopollo
nice. you found the problem. the like operator is really greedy about your data.
darlinton