ansaurus

Question

Answer 1

+11 A:

One SQL query, for sure.

This will

Save you lots of roundtrips to database
Allow to use more efficient GROUP BY methods

Since your aggregates may be performed equally well by the database, it will also be better for mainainability: you have all your resultset logic in one place.

Here is an example of a query that returns every row and calculates a SUM:

SELECT  string_data, numeric_data, SUM(numeric_data) OVER (PARTITION BY key)
FROM    table

Note that this will most probably use parallel access to calculate SUM's for different key's, which is hardly implementable in PHP.

Same query in MySQL:

SELECT  key, string_data, numeric_data,
        (
        SELECT  SUM(numeric_data)
        FROM    table ti
        WHERE   ti.key = to.key
        ) AS key_sum
FROM    table to

Quassnoi 2009-06-03 18:05:48

"Allow to use more efficient GROUP BY methods" - how so? Maybe the example wasn't clear but if I only use one SQL query I thought that I could *not* use GROUP BY, because I need to see every row

Ben Dunlap 2009-06-03 18:08:07

You need to see every row AND calculate aggregates. You ARE using a GROUP BY, but your own GROUP BY. Your HASH TABLE written in PHP will hardly be more efficient than SQL Server's one written in C.

Quassnoi 2009-06-03 18:10:09

Can you show me an example of one query that can get me every row *and* calculate aggregates? Are you thinking of a SELF JOIN?

Ben Dunlap 2009-06-03 18:15:34

@bslorence: see post update

Quassnoi 2009-06-03 18:22:27

Is SUM OVER a proprietary T-SQL feature?

Ben Dunlap 2009-06-03 18:29:41

No, it's also in Oracle and in upcoming PostgreSQL 8.4

Quassnoi 2009-06-03 18:30:24

Oops, asked the logically incorrect question. Should have said: "Is SUM OVER a standard SQL feature"? Any rate I'm using MySQL so it's not available to me. And as I think about it I'm not sure I could do a SELF JOIN either, because GROUP BY would affect the result-set of the JOIN. OTOH I could probably get away with a UNION, but that strikes me as a bit kludge-y.

Ben Dunlap 2009-06-03 18:43:01

SUM OVER() is just one example. Most set operations (most, not all, but most) can be performed more efficiently and more elegantly on database side using SQL. And there is nothing inherently bad in a self join.

Quassnoi 2009-06-03 18:45:49

"more elegantly on database side using SQL" -- which is why I was initially inclined toward my first option (master query and loop full of secondary queries). I'm still not seeing a decent way to get the information I need with only one query, using standard SQL. I didn't suggest that a self join was bad, but that it wouldn't work.

Ben Dunlap 2009-06-03 18:53:48

That leaves me with UNION: "(aggregate-query) UNION (individual-query)". But I think I would need to kludge in a literal column or the like, to distinguish the aggregated rows from the individuals.

Ben Dunlap 2009-06-03 18:54:23

See how to do your query in MySQL

Quassnoi 2009-06-03 18:59:44

Thank you. That just opened up a new world for me in SQL.

Ben Dunlap 2009-06-03 19:12:43

Answer 2

A:

If performance isn't a concern, I'd go with the second. Seems the tiniest bit friendlier.

If performance were a concern, my answer would be "don't think, profile". :)

chaos 2009-06-03 18:06:39

Answer 3

A:

The second answer is by far more clear, sensible and maintainable. You're saying the same thing with less code, which is usually better.

And I know you said performance is not a concern, but why fetch data more than you have to?

Tim Hoolihan 2009-06-03 18:06:46

Answer 4

A:

I can't be certain from the example here, but I'd like to know if there's a chance to do the aggregation and other processing right in the SQL query itself. In this case, you'd have to evaluate "more maintainable" with respect to your relative comfort level expressing that processing in SQL code vs. PHP code.

Is there something about the additional processing you need to do on each row that would prevent you from expressing everything in the SQL query itself?

D. Lambert 2009-06-03 18:06:53

The example is a little oversimplified. I need to see both the result of the aggregation for a couple of columns, and the individual values of several columns in each row. I could do it all in one query with a SELF JOIN, I suppose -- but if we're talking about readability, not sure that's the right decision.

Ben Dunlap 2009-06-03 18:09:50

Answer 5

A:

I don't think you'll find many situations at all where doing a query-per-iteration of a loop is the better choice. In fact, I'd say it's probably a good rule of thumb to never do that.

In other words, the fewer round trips to the database, the better.

Depending on your data and actual tables, you might be able to let SQL do the aggregation work and select all the rows you need with one query.

Peter Bailey 2009-06-03 18:10:58

"the fewer round trips to the database, the better." But why, if performance is not a concern?

Ben Dunlap 2009-06-03 18:22:04

Because it's a terrible practice to get accustomed to. It's better to learn to do this the "right way" so that when you're working on something where performance IS a concern, you won't have to unlearn these bad habits.

Peter Bailey 2009-06-03 19:05:40

I'm sorry -- by "why" I meant "why is it a terrible practice"? Are you just saying that generally speaking it's better to code in a way that respects performance concerns? Or is there some other reason, apart from performance, why "the fewer round trips to the database, the better"?

Ben Dunlap 2009-06-03 19:17:22

"..in a way that respects performance concerns?" Yes. Start asking colleagues or looking around the 'net about web application performance - you will almost universally see that the database is where the bulk of performance issues arise.

Peter Bailey 2009-06-03 19:43:23

In other words, when I say "performance is not a concern", your response is, "yes it is". I suspect you're right -- and that *is* what you're saying, right?

Ben Dunlap 2009-06-03 21:05:32

That's what I'm saying, yes. Even though your current project may only have one user and that user is you because you're running this application from your localhost, it's just not the right way to do these things. As much as I believe that premature optimization should be avoided, there are some things you just don't do, and a query per iteration of a loop - ESPECIALLY when that loop is over the result set of ANOTHER query - is one of those no-nos (yes, there are exceptions to every rule). This is more about the habits you're learning, and not how performant this specific application will be.

Peter Bailey 2009-06-03 21:49:26

I'm totally on-board with that. Thanks for the insight.

Ben Dunlap 2009-06-05 06:03:39

Answer 6

A:

one sql query is probably a better idea. It avoids you having to re-write relational operations

George 2009-06-03 18:19:35

Answer 7

A:

I think somehow you've answered your own question, because you say you have two different processings : one aggregation and one row by row.

if you want to keep everything readable and maintainable, mixing both in a single query doesn't sound right, the query will answer two different needs so it won't be very readable
even if perf is not an issue, it's faster to do the aggregation on the DB server instead of doing it in code
with only one query, the code that will handle the result will mix two processings, handling rows and computing aggregations in the same time, so in time this code will tend to get confusing and buggy
the same code might evolve over time, for instance the row-by-row can get complex and could create bugs in the aggregation part or the other way around
if in the future you'll need to split these two treatments, it will be harder to disentangle the code that at that moment, somebody else has written ages ago...

Performance considerations aside, in terms of maintainability and readability I'd recommend to use two queries.

But keep in mind that the performance factor might not be an issue at the moment, but it can be in time once the db volume grows or whatever, it's never a negligible factor on long term ...

Billy 2009-06-03 20:16:51

Were you recommending one approach or the other or did you just want to expand on the pros and cons a bit?

Ben Dunlap 2009-06-03 21:02:08

Sorry Ben if I wasn't clear enough, in terms or readability and maintainability I was recommending to use two queries, I'll rephrase the answer.

Billy 2009-06-03 23:08:30

Answer 8

A:

Even if perf is not an issue, your mind is. When a musician practices every movement is intended to improve the musician's skill. As a developer, you should develop every procedure to improve your skill. iterative loops though data is sloppy and ugly. SQL queries are elegant. Do you want to develop more elegant code or more sloppy code?

2009-06-07 20:39:28

Thanks Paul. The consensus is definitely on your side and I did take a one-query approach. Wondering if you could provide a little more background on "iterative loops though data is sloppy and ugly". What's the argument for that? Or am I beyond help if I need an argument? ;-)

Ben Dunlap 2009-06-11 23:29:50

ansaurus

tags:

views:

answers:

One SQL query, or many in a loop?

related questions