views:

73

answers:

3

I am looking for a good design pattern, or best practice to achieve a situation of "either this query, or else that other query", with best performance and least overhead.

Business logic/program of demans says "all items since Foo", unless that returns less then three items, then "all items". I am refactoring the current code, and cannot come up with a good way to achieve this logic.

Current pseudo-code (Drupal/PHP):

<?php
$result = db_query(
'SELECT n.uid FROM og_ancestry oga ' .
'INNER JOIN node n on n.nid = oga.nid ' .
'WHERE oga.group_nid = %d AND n.created > %d GROUP BY n.uid ' .
'ORDER BY cnt DESC LIMIT %d', $group_nid, $since, $limit);


while ($row = db_fetch_array($result)) {
  $uids[] = $row['uid'];
}

if (count($uids) < 3) {
  $result = db_query(
    'SELECT n.uid FROM og_ancestry oga ' .
    'INNER JOIN node n on n.nid = oga.nid ' .
    'WHERE oga.group_nid = %d GROUP BY n.uid ' .
    'ORDER BY cnt DESC LIMIT %d', $group_nid, $limit);

  while ($row = db_fetch_array($result)) {
    $uids[] = $row['uid'];
  }
}
//...do something with the result.
?>

This code feels "not right", first of all because of DRY: it contains the same query, with one minor difference. I can change that by a bit smarter query-building.

But worse is the fact that I need to hammer the database (query is quite heavy) only to find out that in more then half of the cases, I need to throw away the result and query the db again.

How would you approach such a case?

A: 

Would one query with ORDER BY n.created DESC, cnt DESC LIMIT 3 work? It will get the most recently created items first, and return no more than 3 of them. It's not exactly the same as what you have above, but it's pretty close...

Borealid
I need it exactly the other way around: if the first returns LESS then 3, then run the second one. The result, then, always has MORE then 3 results.
berkes
+2  A: 

You could use a single CASE/WHEN query to see if the first query returns enough. If so, use the THEN block. If not use the ELSE block. That would save you the second roundtrip to the database.

Gordon
I have played around with CASE/WHENs but I cannot find a way to get either CASE WHEN count(*) < 3 WHERE A = X WHEN count (*) >= 3 WHERE B = Y. Probably I don't understand CASE/WHERE, but AFAIK it cannot be used to switch between differen WHERE clauses.
berkes
+4  A: 

If, as you say, "in more then half of the cases, I need to throw away the result and query the db again," then your best bet may be to run only the second query and then evaluate the resultant dataset locally, discarding records if appropriate. It's really more a matter of moving the complexity around than it is of reducing complexity, but at least there's just one trip to the database.

If you ORDER BY n.created DESC, the filtering could simply look at the third record, and if it's earlier than foo, you're done; otherwise, you need to find the first record before foo and discard it and subsequent records.

Carl Manaster
I have benchmarked it a little. And indeed. Filtering in code is by far the fastest. esp since I enforce a LIMIT 5 anyways, so at most I discard 2 items.
berkes
Thanks for letting us know - it's always nice to know what worked, out of all the answers we get on SO. And it's nice that what felt like a good simple answer also turns out to be performant. Sometimes we luck out!
Carl Manaster