views:

1750

answers:

10

First of all, this question regards MySQL 3.23.58, so be advised.

I have 2 tables with the following definition:

Table A: id INT (primary), customer_id INT, offlineid INT

Table B: id INT (primary), name VARCHAR(255)

Now, table A contains in the range of 65k+ records, while table B contains ~40 records. In addition to the 2 primary key indexes, there is also an index on the offlineid field in table A. There are more fields in each table, but they are not relevant (as I see it, ask if necessary) for this query.

I was first presented with the following query (query time: ~22 seconds):

SELECT b.name, COUNT(*) AS orders, COUNT(DISTINCT(a.kundeid)) AS leads
FROM katalogbestilling_katalog a, medie b
WHERE a.offlineid = b.id
GROUP BY b.name

Now, each id in medie is associated with a different name, meaning you could group by id as well as name. A bit of testing back and forth settled me on this (query time: ~6 seconds):

SELECT a.name, COUNT(*) AS orders, COUNT(DISTINCT(b.kundeid)) AS leads
FROM medie a
INNER JOIN katalogbestilling_katalog b ON a.id = b.offline
GROUP BY b.offline;

Is there any way to crank it down to "instant" time (max 1 second at worst)? I added the index on offlineid, but besides that and the re-arrangement of the query, I am at a loss for what to do. The EXPLAIN query shows me the query is using fileshort (the original query also used temp tables). All suggestions are welcome!

+1  A: 

I'm going to guess that your main problem is that you are using such an old version of MySQL. Maybe MySQL 3 doesn't like the COUNT(DISTINCT()).

Alternately, it might just be system performance. How much memory do you have?

Still, MySQL 3 is really old. I would at least put together a test system to see if a newer version ran that query faster.

Neall
I am aware of this, but I am hired as an outside consultant (For a lack of better word) to optimize the performance and this would probably be outside of the scope of the job.
Christian P.
I figured there was some reason. If all else fails though, being able to point to a specific problem with MySQL 3 might help. Good luck either way.
Neall
A: 

How is kundeid defined? It would be helpful to see the full schema for both tables (as generated by MySQL, ie. with indexes) as well as the output of EXPLAIN with the queries above.

The easiest way to debug this and find out what is your bottleneck would be to start removing fields, one by one, from the query and measure how long does it take to run (remember to run RESET QUERY CACHE before running each query). At some point you'll see a significant drop in the execution time and then you've identified your bottleneck. For example:

SELECT b.name, COUNT(*) AS orders, COUNT(DISTINCT(a.kundeid)) AS leads
FROM katalogbestilling_katalog a, medie b
WHERE a.offlineid = b.id
GROUP BY b.name

may become

SELECT b.name, COUNT(DISTINCT(a.kundeid)) AS leads
FROM katalogbestilling_katalog a, medie b
WHERE a.offlineid = b.id
GROUP BY b.name

to eliminate the possibility of "orders" being the bottleneck, or

SELECT b.name, COUNT(*) AS orders
FROM katalogbestilling_katalog a, medie b
WHERE a.offlineid = b.id
GROUP BY b.name

to eliminate "leads" from the equasion. This will lead you in the right direction.

update: I'm not suggesting removing any of the data from the final query. Just remove them to reduce the number of variables while looking for the bottleneck. Given your comment, I understand

SELECT b.name
FROM katalogbestilling_katalog a, medie b
WHERE a.offlineid = b.id
GROUP BY b.name

is still performing badly? This clearly means it's either the join that is not optimized or the group by (which you can test by removing the group by - either the JOIN will be still slow, in which case that's the problem you need to fix, or it won't - in which case it's obviously the GROUP BY). Can you post the output of

EXPLAIN SELECT b.name
FROM katalogbestilling_katalog a, medie b
WHERE a.offlineid = b.id
GROUP BY b.name

as well as the table schemas (to make it easier to debug)?

update #2

there's also a possibility that all of your indeces are created correctly but you have you mysql installation misconfigured when it comes to max memory usage or something along those lines which forces it to use disk sortation.

tpk
I have tried this and removing either of the COUNT's doesn't reduce the query time in any significant way. What I am really looking for is some way to optimize by adding indexes or re-writing the query somehow (sine I need all the data as listed).
Christian P.
A: 

You may get a small increase in performance if you remove the inner join and replace it with a nested select statement also remove the count(*) and replace it with the PK.

SELECT a.name, COUNT(*) AS orders, COUNT(DISTINCT(b.kundeid)) AS leads FROM medie aINNER JOIN katalogbestilling_katalog b ON a.id = b.offline GROUP BY b.offline;

would be

SELECT a.name, COUNT(a.id) AS orders, (SELECT COUNT(kundeid) FROM katalogbestilling_katalog b WHERE b.offline = a.id) AS Leads FROM medie a;

Marcus King
+1  A: 

Unfortunately, mysql 3 doesn't support sub-queries. I suspect that the old version in general is what's causing the slow performance.

Christian P.
A: 

Well if the query is run often enough to warrant the overhead, create an index on table A containing the fields used in the query. Then all the results can be read from an index and it wont have to scan the table.

That said, all my experience is based on MSSQL, so might not work.

TrevorD
A: 

Your second query is fine and 65k+40k rows is not very large :)

Put an new index on katalogbestilling_katalog.offline column and it will run faster for you.

Jonathan
A: 

How long does this take:

SELECT fieldlist FROM A
SELECT fieldlist FROM B

If your mysql is running so slow over the join, perhaps it's better to get the data out with single table scans and sew the data together outside the db. 65k records really isn't that many.

David B
A: 

You could try making sure there are covering indexes defined on each table. A covering index is just an index where each column requested in the select or used in a join is included in the index. This way, the engine only has to read the index entry and doesn't have to also do the corresponding row lookup to get any requested columns not included in the index. I've used this technique with great success in Oracle and MS SqlServer.

Looking at your query, you could try:

one index for medie.id, medie.name
one index for katalogbestilling_katalog.offlineid, katalogbestilling_katalog.kundeid

The columns should be defined in these orders for the index. That makes a difference whether the index can be used or not.

More info here:

Covering Index Info

tyshock
Down to 1.2 sec from ovwr 6 seconds - worked wonders. Thanks!
Christian P.
Glad to hear it helped.
tyshock
A: 

Try adding an index to (offlineid, kundeid)

I added 180,000 BS rows to katalog and 30,000 BS rows to medie (with katalog offlineid's corresponding to medie id's and with a few overlapping kundeid's to make sure the disinct counts work). Mind you this is on mysql 5, so if you don't have similar results, mysql 3 may be your culprit, but from what I recall mysql 3 should be able to handle this just fine.

My tables:

CREATE TABLE `katalogbestilling_katalog` (
  `id` int(11) NOT NULL auto_increment,
  `offlineid` int(11) NOT NULL,
  `kundeid` int(11) NOT NULL,
  PRIMARY KEY  (`id`),
  KEY `offline_id` (`offlineid`,`kundeid`)
) ENGINE=MyISAM  DEFAULT CHARSET=utf8 AUTO_INCREMENT=60001 ;

CREATE TABLE `medie` (
  `id` int(11) NOT NULL auto_increment,
  `name` varchar(255) NOT NULL,
  PRIMARY KEY  (`id`)
) ENGINE=MyISAM  DEFAULT CHARSET=utf8 AUTO_INCREMENT=30001 ;

My query:

SELECT b.name, COUNT(*) AS orders, COUNT(DISTINCT(a.kundeid)) AS leads
FROM medie b
INNER JOIN katalogbestilling_katalog a ON b.id = a.offlineid
GROUP BY a.offlineid
LIMIT 0 , 30


"Showing rows 0 - 29 (30,000 total, Query took 0.0018 sec)"

And the explain:

id:  1
select_type:    SIMPLE
table: a
type: index
possible_keys:  NULL
key:    offline_id
key_len:    8
ref: NULL
rows: 180000
Extra: Using index

id: 1
select_type:    SIMPLE
table: b
type: eq_ref
possible_keys:  PRIMARY
key:    PRIMARY
key_len:    4
ref: test.a.offlineid
rows: 1
Extra:
enobrev
A: 

Try optimizing the server itself. See this post by Peter Zaitsev for the most important variables. Some are InnoDB specific, while others are for MyISAM. You didnt mention which engine you were using which might be relevant in this case (count(*) is much faster in MyISAM than in InnoDB for example). Here is another post from same blog, and an article from MySQL Forge

Shinhan