views:

879

answers:

3

I've got two MySQL queries that both insert data into a table. Both have the following format:

CREATE TABLE IF NOT EXISTS `data` (
`id` BIGINT NOT NULL AUTO_INCREMENT UNIQUE,
PRIMARY KEY (`id`)
)
SELECT `field1`, `field2`
WHERE `active` = 1

The only differences between the two queries are how field1 and field2 are determined, and some minor differences in the conditions clause. Both run up to 12K and more records.

Now, what will be more efficient:

A. Run both queries separately:

if (mysql_query($query1)) {
    return mysql_query($query2);
}
return false;

B. OR combine the two queries with a UNION, and run once:

$query = 'SELECT `field1`, `field2` WHERE `active` = 1
            UNION 
            SELECT DO_ONE(`field1`), DO_TWO(`field2`) WHERE `active` = 1
            ORDER BY `field1`';
return mysql_query('CREATE TABLE IF NOT EXISTS `data` (
`id` BIGINT NOT NULL AUTO_INCREMENT UNIQUE,
PRIMARY KEY (`id`)
) ' . $query)

The data from the one query is useless without the data from the other, so both need to succeed. DO_ONE and DO_TWO are user defined MySQL functions that change the field data according to some specs.

A: 

Your options do different things. First one returns the results from the second query if the first query executes correctly (which is BTW independent of the results that it returns, it can be returning an empty rowset). Second one returns the results from the first query and the second query together. First option seems to me pretty useless, probably what you want to achieve is what you did with the UNION (unless I missunderstood you).

EDIT: After reading your comment, I think you are after something like this:

SELECT true where (EXISTS(SELECT field1, field2 ...) AND EXISTS (SELECT Field1, field2 ...)).

That way you will have only one query to the DB, which scales better, takes less resources from the connection pool and doesn't double the impact of latency if you have your DB engine in a different server, but you will still interrupt the query if the first condition fails, which is the performance improvement that you where looking for with the nested separated queries.

As an optimization, try to have first the condition that will execute faster, in case they are not the same. I assume that if one of them requires those field calculations would be slower.

palako
Neither of the options will return resultsets. Both will return false if one or both of the querie failed, or a result object. As these queries only insert data, there is no need to query the result objects. It will only be interpreted as a boolean, to see if the insertion was executed successfully.
Jrgns
I guess you mean that in the case of the union, if the first returns empty, the second is going to return empty too because of the queries that you are doing. What UNION will do is put together the results from each of the queries, meaning that you would get a TRUE if one returns records and the other doesn't. Anyway, I'll edit my answer now that I know what you want a bit better.
palako
A: 

The UNION approach should definitely be faster due to the expense of making two mysql api calls from php vs. one.

aaronmccall
+1  A: 

Aaronmccall's answer is probably the best in general -- the UNION approach does it all in one SQL call. In general that will be the most "efficient", but there could be side issues that could come into play and affect the measure of "efficient" for your particular application.

Specifically, if the UNION requires a temporary table to gather the intermediate results and you are working with very large sets of data, then doing two separate straight SELECTs into the new table might turn out being more efficient in your particular case. This would depend on the internal workings, optimizations done, etc within the database engine (which could change depending on the version of the database engine you are using).

Ultimately, the only way to answer your question on such a specific question like this might be to do timings for your particular application and environment.

You also might want to consider that the difference between the time required for two separate queries vs an "all in one" query might be insignificant in the grand scheme of things... you are probably talking about a difference of a few milliseconds (or even microseconds?) unless your mysql database is on a separate server with huge latency issues. If you are doing thousands of these calls in one shot, then the difference might be significant, but if you are only doing one or two of these calls and your application is spending 99.99% of its time executing other things, then the difference between the two probably won't even be noticed.

---Lawrence

Lawrence
One of the queries executes in miliseconds. The other takes minutes because of a rather complicated storec procedure. I've decided to move away from the stored proc to optimize the query.
Jrgns