ansaurus

Question

How do we greatly optimize our MySQL database (or replace it) when using joins?

Answer 1

+1 A:

Have you analysed your execution plan? Have you analysed your indices?

My first guess would be that you need an index on campaigns for user_id, index on countries for campaign_id, on keywords on campaign_id... maybe others. You need to get an execution plan to see what your query is doing.

The other option: How often does the data in this result set change? By the minute? Hour? Day? If it's daily or hourly (well, several hours), it might be better to have a secondary table that contains ALL the columns (or just the columns that aren't likely to change frequently) of this result set and is populated by this query every n hours. Then your app would just query the secondary table (or maybe join with one table that has frequently changing data), it could be faster that way.

FrustratedWithFormsDesigner 2010-06-14 20:15:39

The execution plan is above. I did already have the indices you mentioned (see above.) The data changes probably around every hour. I like the idea of a secondary table, but I have a feeling that it'd be a very, very huge table since we're doing a lot of joins. Maybe sticking it in memory is a better idea? (we have a lot of memory to work with, btw.) Thank you for your answer!

jkaz 2010-06-14 23:22:28

@jkaz: yeah I think I posted before you posted the plan and index info. Looks good I guess, so time to optimize on something else! The in-memory table/cache with periodic updates is still the best thing I've seen so far for this, but after some coffee , maybe I'll have some other ideas. ;)

FrustratedWithFormsDesigner 2010-06-15 13:19:45

Thanks, Frustrated!

jkaz 2010-06-16 05:54:48

Answer 2

+1 A:

Database theory and the nominal practice exist to provide a framework for a majority of cases. Not every database usage pattern fits neatly into 3rd normal form. Hence the emergence of NoSQL. These database don't work well in a majority of cases but do work great in specific cases. One reason they work well is because they DON'T work like a normal RDBMS. Cassandra does have some facility for 'joining' but I don't remember the exact details. If you want a quick understanding I'd recommend the Digg developers blog. There's a nice simple description.

The problem is that I'll bet you a pickle that joining 4 tables would be slower than mySQL. And the only way to know for sure would be learning a new DBMS, installing it, tuning the install as well as you can tune MySQL and setting up all your data and .... you'll like find out MySQL does pretty damn good.

Trying to solve the EXACT SAME problem the EXACT SAME way with a different engine won't cut it... you have to THINK like a NoSQL developer, not a RDBMS developer using NoSQL.

But you can think about the problem as Frustrated suggests.

Why do we have Third Normal Form? Ease of Update mainly. I update one row instead of dozens. It also helps constrain data, if I carefully control addition of countries in the country table, I'll never get a bad one in the campaign table. After that, 3NF doesn't make querying faster, which is why we invented reporting databases, OLAP, Cubes, Star Schemas.

the Key is that it's a different structure for reporting vs editing/capturing.

As Frustrated said, determine the speed of change in your underlying data. If you're really adding countries every 5 minutes, I'll be stunned. Campaigns? probably occasional? Ads? a couple times a day. How long would it take to build a fully flattened table and index it? How many rows does that produce? if that cycle time is much shorter than your update frequency... build that and see. Test the query speed. That's a cheaper experiment than going for a whole new DB.

Stephanie Page 2010-06-14 22:07:34

I agree, that's an excellent idea and I'll definitely be trying that out. It would certainly be faster than doing all the joins—maybe we can even stick the resulting flat database in a NoSQL table!—so I'll try it out and report back on results. Thank you!

jkaz 2010-06-14 23:24:15

Mmm... no. You wouldn't put them in a noSql table unless you want basically one indexed value (I said basically). At it's heart noSql are simply keyword/value pairs, where the value can be a complex type (like address). If you want to search all addresses by zip there are further complications but i think recent enhancements can speed that up. But that's been in mySql for 15 years. An index on zip, voila. If you always use the same filter columns, your index will be obvious, if you mix and match, research how mySql can use more than one index. I've posted about that here.

Stephanie Page 2010-06-15 14:35:52

ansaurus

tags:

views:

answers:

How do we greatly optimize our MySQL database (or replace it) when using joins?

EDIT: here are more details:

Full executed SQL with flattened select (truncated above):

EXPLAIN/execution plan:

DEFINED INDICES:

related questions