ansaurus

Question

How can I speed up this SELECT CONCAT/GROUP BY query?

Answer 1

A:

One way to best optimize a query like this is to have those columns set as Index columns. That way it can easily sort/group based off of a tree or hash. Also, concatenation of strings might have some implications too.

Daniel A. White 2009-07-31 12:37:56

I've already got the columns indexed. Do you think it'd be better to concatenate the strings in the code rather than in the db?

2009-07-31 12:42:02

Answer 2

+4 A:

Create a composite index on (state, city) and rewrite your query as this:

SELECT  CONCAT_WS(', ', city, state) AS location, AVG(latitude), AVG(longitude) 
FROM    places
WHERE   state='NY'
        AND city='New York'
GROUP BY
        state, city

Note that for this very query you may omit GROUP BY clause:

SELECT  'New York, NY' AS location, AVG(latitude), AVG(longitude) 
FROM    places
WHERE   state='NY'
        AND city='New York'

However, this query will still need it:

SELECT  CONCAT_WS(', ', city, state) AS location, AVG(latitude), AVG(longitude) 
FROM    places
WHERE   state='NY'
GROUP BY
        state, city

Quassnoi 2009-07-31 12:40:27

When you GROUP BY multiple columns, will that have the same logical effect as GROUP BY location?

2009-07-31 12:42:40

Yes, except for the really weird cases (like, when you have `state = 'New York, NY' AND city = ''` )

Quassnoi 2009-07-31 12:44:39

Given the way my geocoder works, there's no way that a comma will end up in either state or city (it assumes that a "," is a hard separator between two tokens). So that sounds pretty good to me.

2009-07-31 12:46:33

Doing this seems to have greatly increased my Handler_read_rnd_next... Is this a bad thing? I'm trying to figure out why changing the code would affect this variable.

2009-07-31 13:57:23

@Roger: could you please build `EXPLAIN` for both queries (your original one and mine)?

Quassnoi 2009-07-31 13:59:16

Answer 3

A:

Adding an index on fields "city" and "state" will help.

Also, depending on the cardinality of each field (number of distinct values), on the version of MySQL, the table engine and other parameters, inverting the WHERE clauses may have an impact on the execution time of your query. I'd try:

WHERE state='NY' AND city='New York'

Nicolas 2009-07-31 12:43:30

Answer 4

+2 A:

It's funny, but almost every problem people have with databases is speed, not storage requirements. That should tell you something :-)

We've had problems like this before and I've said this many times: per-row functions generally do not scale well. The best method we've found to fix them is with insert/update triggers (I'm assuming MySQL has these).

Create another column call pretty_city_state (or whatever) and have the triggers populate it from city and state whenever you insert or update a row. Then create an index on it.

This takes advantage of the fact that database rows are generally read far more often than they're written (especially in this case). By evaluating that column on write, you bear the cost across the writes (thousands) rather than the reads (millions probably). And it's the writing when it should be borne simply because pretty_city_state will only change when either city or state changes. If you do the concat on every select, you're wasting effort.

Try that and measure the difference - I'm sure you'll find that your selects will scream along at a minimal cost for the triggers (and that cost disappears totally once you have all cities and states within your database.

And yes, I know this breaks 3NF. It's perfectly acceptable to do so for performance reasons if you know what you're doing.

Your query could be done as:

SELECT pretty_city_state as location, AVG(latitude), AVG(longitude) 
FROM places
WHERE city='New York' AND state='NY'
GROUP BY pretty_city_state

or, maybe even faster (measure, don't guess) if you can concatenate the city and state before starting the query:

SELECT pretty_city_state as location, AVG(latitude), AVG(longitude) 
FROM places
WHERE pretty_city_state ='New York, NY'
GROUP BY pretty_city_state

paxdiablo 2009-07-31 12:53:26

The issue is that we sometimes change what is concatenated, depending on what is asked. Namely, if the user doesn't include a zipcode, we don't return a zipcode in our response; but if they do, we do. Do you think we should make multiple CONCAT rows?

2009-07-31 12:58:37

Yes, definitely. Storage is cheap, CPU grunt is not. And once you've made the decision to vioalte 3NF for performance, you may as well go all the way :-)

paxdiablo 2009-07-31 13:01:12

For bonus points (since I'm obviously somewhat of an SQL newbie), what's the UPDATE query one could do to add pretty_city_state to each row (supposing I've created this column already; I think I can handle that ;) ).

2009-07-31 13:02:54

Something like "update places set pretty_city_state = concat_ws(', ',city,state);"

paxdiablo 2009-07-31 13:07:18

But, unless your DBMS is too smart by half, just create the triggers then "update places set city = city;" and let the trigger take care of it.

paxdiablo 2009-07-31 13:08:59

ansaurus

tags:

views:

answers:

How can I speed up this SELECT CONCAT/GROUP BY query?

related questions