ansaurus

Question

Answer 1

+2 A:

Knowing the timing for each 1K rows is good, but the real question is how they'll be used.

Are you planning to send all these back to a UI? Google doles out results 25 per page; maybe you should, too.

Are you planning to do calculations in the middle tier? Maybe you can do those calculations on the database and save yourself bringing all those bytes across the wire.

My point is that you may never need to work with 1,000 or one million rows if you think carefully about what you do with them.

You can EXPLAIN PLAN to see what the difference between the two queries is.

duffymo 2010-05-14 23:54:34

Thanks for the response. Unfortunately, one of the requirements is that all of this data gets sent to a map that contains the world of all Projects, so pagination isn't good enough.Assuming that weren't a requirement, though, there are 16 projects per page. Do you think it's not worthwhile to do all of this and instead run 3 or 4 simple queries for each result? That's 64 queries per page, but if they're simple, maybe it's trivial?

Aaron Carlino 2010-05-14 23:59:30

Answer 2

+1 A:

I would go with the first approach. You are allowing the RDBMS to do it's job, rather than trying to do it's job for it.

By creating a temp table, you will always create the full table for each query. If you only want data for one project, you still end up creating the full table (unless you restrict each INSERT statement accordingly.) Sure, you can code it, but it's already becoming a fair amount code and complexity for a small performance gain.

With a SELECT, the db can fetch the appriate amount of data, optimizing the whole query based on context. If other users have queried the same data, it may even be cached (query, and possibly data, depending upon your db). If performance is truly a concern, you might consider using Indexed/Materialized Views, or generating a table on an INSERT/UPDATE/DELETE trigger. Scaling out, you can use server clusters and partioned views - something that I believe will be difficult if you are creating temporary tables.

EDIT: the above is written without any specific rdbms in mind, although the OP added that mysql is the target db.

mdma 2010-05-14 23:55:24

I love the suggestion of using views. I've never created one before, but it seems like this is the perfect application of one.Interesting that temporary tables seem to be a redundant exercise, doing what mysql will do anyway.

Aaron Carlino 2010-05-15 00:09:41

Ok, it's mysql - you might want to add that to your question, as well as the version you are using.

mdma 2010-05-15 00:17:46

Mysql doesn't support indexed views - you can create a view, but it's more like a simple way of hiding details, so the text of the query lives in your database rather than in your code. Views also provide a way of breaking complex queries into smaller pieces. They can hide a lot of detail, which is usually a benefit, but also can lead to a simple select hiding complex queries if not managed carefully.

mdma 2010-05-15 00:27:02

Answer 3

+1 A:

There is a third option which is derived tables:

Select Project.ID AS PROJECT_ID
    , Payments.Total AS TotalPaidBack
    , Coalesce(ContributionStats.DonarCount, 0) As ContributorCount
    , ContributionStats.Total As TotalReceived
From Project
    Left Join   (
                Select C1.RecipientId, Sum(C1.Amount) As Total, Count(Distinct C1.DonarId) ContributorCount
                From Contribution As C1
                Group By C1.RecipientId
                ) As ContributionStats
        On ContributionStats.RecipientId = Project.Project_Id
    Left Join   (
                Select P1.ProjectID, Sum(P1.PaymentAmount) As Total
                From Payment As P1
                Group By P1.RecipientId
                ) As Payments
        On Payments.ProjectId = Project.Project_Id

I'm not sure if it will perform better, but you might give it shot.

Thomas 2010-05-15 00:47:45

Answer 4

+1 A:

A few thoughts:

The derived table idea would be good on other platforms, but MySQL has the same issue with derived tables that it does with views: they aren't indexed. That means that MySQL will execute the full content of the derived table before applying the WHERE clause, which doesn't scale at all.
Option 1 is good for being compact, but syntax might get tricky when you want to start putting the derived expressions in the WHERE clause.
The suggestion of materialized views is a good one, but MySQL unfortunately doesn't support them. I like the idea of using triggers. You could translate that temporary table into a real table that persists, and then use INSERT/UPDATE/DELETE triggers on the Payments and Contribution tables to update the Project Stats table.
Finally, if you don't want to mess with triggers, and if you aren't too concerned with freshness, you can always have the separate stats table and update it offline, having a cron job that runs every few minutes that does the work that you specified in Query #2 above, except on the real table. Depending on the nuances of your application, this slight delay in updating the stats may or may not be acceptable to your users.

Tyler Dumont 2010-05-15 16:26:59

ansaurus

tags:

views:

answers:

What's the most efficient query?

related questions