tags:

views:

1627

answers:

13

I am working on someone else's PHP code and seeing this pattern over and over:

(pseudocode)

result = SELECT blah1, blah2, foreign_key FROM foo WHERE key=bar

if foreign_key > 0  
  other_result = SELECT something FROM foo2 WHERE key=foreign_key  
end

The code needs to branch if there is no related row in the other table, but couldn't this be done better by doing a LEFT JOIN in a single SELECT statement? Am I missing some performance benefit? Portability issue? Or am I just nitpicking?

+3  A: 

I'm with you - a single SQL would be better

hamishmcn
+4  A: 

I'm no DBA, but this looks suspiciously like code that I wrote when I first started database programming. I'd imagine this is simply the work of a novice.

Jeffrey
+2  A: 

There's a danger of treating your SQL DBMS as if it was a ISAM file system, selecting from a single table at a time. It might be cleaner to use a single SELECT with the outer join. On the other hand, detecting null in the application code and deciding what to do based on null vs non-null is also not completely clean.

One advantage of a single statement - you have fewer round trips to the server - especially if the SQL is prepared dynamically each time the other result is needed.

On average, then, a single SELECT statement is better. It gives the optimizer something to do and saves it getting too bored as well.

Jonathan Leffler
Yes, we must keep the optimizer happy and fulfilled! :-)
Paul Tomblin
+2  A: 

It seems to me that what you're saying is fairly valid - why fire off two calls to the database when one will do - unless both records are needed independently as objects(?)

Of course while it might not be as simple code wise to pull it all back in one call from the database and separate out the fields into the two separate objects, it does mean that you're only dependent on the database for one call rather than two...

This would be nicer to read as a query:

Select a.blah1, a.blah2, b.something From foo a Left Join foo2 b On a.foreign_key = b.key Where a.Key = bar;

And this way you can check you got a result in one go and have the database do all the heavy lifting in one query rather than two...

Yeah, I think it seems like what you're saying is correct.

BenAlabaster
+2  A: 

This is definitely wrong. You are going over the wire a second time for no reason. DBs are very fast at their problem space. Joining tables is one of those and you'll see more of a performance degradation from the second query then the join. Unless your tablespace is hundreds of millions of records, this is not a good idea.

Mike
A: 

Considering that in one database hit you have all the data you need having one single SQL statement would be better performance 99% of the time. Not sure if the connections is being creating dynamically in this case or not but if so doing so is expensive. Even if the process if reusing existing connections the DBMS is not getting optimize the queries be best way and not really making use of the relationships.

The only way I could ever see doing the calls like this for performance reasons is if the data being retrieved by the foreign key is a large amount and it is only needed in some cases. But in the sample you describe it just grabs it if it exists so this is not the case and therefore not gaining any performance.

KMessenger
+1  A: 

The only "gotcha" to all of this is if the result set to work with contains a lot of joins, or even nested joins.

I've had two or three instances now where the original query I was inheriting consisted of a single query that had so a lot of joins in it and it would take the SQL a good minute to prepare the statement.

I went back into the procedure, leveraged some table variables (or temporary tables) and broke the query down into a lot of the smaller single select type statements and constructed the final result set in this manner.

This update dramatically fixed the response time, down to a few seconds, because it was easier to do a lot of simple "one shots" to retrieve the necessary data.

I'm not trying to object for objections sake here, but just to point out that the code may have been broken down to such a granular level to address a similar issue.

Dillie-O
Good point. And doesn't that also reveal a weakness in the database engine? Programmers should not need to have such strong SQL-fu...
zetetic
I wouldn't call it a weakness in the data engine. Sometimes it lies in the programmer asking absurd things of the database.
Dillie-O
+2  A: 

The most likely explanation is that the developer simply doesn't know how outer joins work. This is very common, even among developers who are quite experienced in their own specialty.

There's also a widespread myth that "queries with joins are slow." So many developers blindly avoid joins at all costs, even to the extreme of running multiple queries where one would be better.

The myth of avoiding joins is like saying we should avoid writing loops in our application code, because running a line of code multiple times is obviously slower than running it once. To say nothing of the "overhead" of ++i and testing i<20 during every iteration!

Bill Karwin
That was my thought too, but the developer does use outer joins in other parts of the system, so it isn't as if he was completely ignorant. Except of course for the use of PHP in the first place (I kid! I kid!)
zetetic
A: 

A single SQL query would lead in more performance as the SQL server (Which sometimes doesn't share the same location) just needs to handle one request, if you would use multiple SQL queries then you introduce a lot of overhead:

Executing more CPU instructions, sending a second query to the server, create a second thread on the server, execute possible more CPU instructions on the sever, destroy a second thread on the server, send the second results back.

There might be exceptional cases where the performance could be better, but for simple things you can't reach better performance by doing a bit more work.

TomWij
A: 

There is not enough information to really answer the question. I've worked on applications where decreasing the query count for one reason and increasing the query count for another reason both gave performance improvements. In the same application!

For certain combinations of table size, database configuration and how often the foreign table would be queried, doing the two queries can be much faster than a LEFT JOIN. But experience and testing is the only thing that will tell you that. MySQL with moderately large tables seems to be susceptable to this, IME. Performing three queries on one table can often be much faster than one query JOINing the three. I've seen speedups of an order of magnitude.

staticsan
Thanks, that's what I suspected. I don't know enough about the system in question to know whether the coding choice was driven by this type of concern, or something else. But it's helpful to know that there could be a good reason for the developer to do what he did.
zetetic
The fact that this was accepted makes me want to cry... If you ever come to the latter conclusion that independent selects are better, promptly nuke your database because it isn't doing its job.
Evan Carroll
A: 

Doing a simple two table join is usually the best way to go after this problem domain, however depending on the state of the tables and indexing, there are certain cases where it may be better to do the two select statements, but typically I haven't run into this problem until I started approaching 3-5 joined tables, not just 2.

Just make sure you have covering indexes on both tables to ensure you aren't scanning the disk for all records, that is the biggest performance hit a database gets (in my limited experience)

Redbeard 0x0A
A: 

You are completely correct that the single query is the way to go. To add some value to the other answers offered let me add this axiom: "Use the right tool for the job, the Database server should handle the querying work, the code should handle the procedural work."

The key idea behind this concept is that the compiler/query optimizers can do a better job if they know the entire problem domain instead of half of it.

JohnFx
A: 

You should always try to minimize the number of query to the database when you can. Your example is perfect for only 1 query. This way you will be able later to cache more easily or to handle more request in same time because instead of always using 2-3 query that require a connexion, you will have only 1 each time.

Daok