views:

76

answers:

1
SELECT * from `employees` a 
LEFT JOIN (SELECT phone1 p1, count(*) c, FROM `employees` GROUP BY phone1) b
ON a.phone1 = b.p1;

I'm not sure if it is this query in particular that has the problem. I have been getting terrible performance in general with this database. The table in question has 120,000 rows. I have tried this particular query remotely and locally with the MyISAM and InnoDB engines, with different types of joins, and with and without an index on phone1. I can get this to complete in about 4 minutes on a 10,000 row table successfully but performance drops exponentially with larger tables. Remotely it will lose connection to the server and locally it brings my system to its knees and seems to go on forever.

This query is only a smaller step I was trying to do when a larger query couldn't complete. Maybe I should explain the whole scenario. I have one big flat ugly table that lists a bunch of people and their contact info and the info of the companies they work for. I'm trying to normalize the database and intelligently determine which phone numbers apply to individual people and which apply to an office location. My reasoning is that if a phone number occurs multiple times and the number of occurrence equals the number of times that the street address it is attached to occurs then it must be an office number. So the first step is to count each phone number grouping by phone number. Normally if you just use COUNT()...GROUP BY it will only list the first record it finds in that group so I figured I have to join the full table to the count table where the phone number matches. This does work but as I said I can't successfully complete it on any table much larger than 10,000 rows. This seems pathetic and this doesn't seem like a crazy query to do. Is there a better way to achieve what I want or do I have to break my large table into 12 pieces or is there something wrong with the table or db?

Edit, to answer Rob's request:

1, 'PRIMARY', 'a', 'ALL', '', '', '', '', 60097, ''
1, 'PRIMARY', '', 'ALL', '', '', '', '', 9363, ''
2, 'DERIVED', 'employees1', 'ALL', '', '', '', '', 60097, 'Using temporary; Using filesort'
+1  A: 

If this is for a one-time normalization "cleanup", I would push your subquery into a temporary table, index, do you join against it, and then drop it when you're done.

great_llama
Holy smokes! I never imagined this would make a difference but now it is almost instantaneous! (4-6 seconds) At first I forgot to index the temp table but it already was an improvement, at least it was starting to display results finally in 1600 row spurts. Would have taken a few minutes without the index. It keeps taking me an embarrassingly long time to realize how I can break my queries down further and further and all the while I am thinking why do I even need to do this. Why did I need to do this? Anyway, thanks so much oh Great Llama.
Moss
Problem is this was already a sub-subquery and I will have to do it again for the second phone number, the fax, and the tollfree number. Oh well.
Moss
lather, rinse, repeat... =)
great_llama