I'd like to select all rows from one table which match "one or more" rows in another table, in the most efficient way.
SELECT identity.id FROM identity
INNER JOIN task ON
task.identityid=identity.id
AND task.groupid IN (78, 122, 345, 12, 234, 778, 233, 123, 33)
Currently if there are multiple matching tasks this returns the same identity multiple times (but the performance penalty of eliminating these later is not too bad). I'd like this to instead return only one row for each identity, that matches one or more of these task groups, and I was wondering if there was a more efficient way than to do DISTINCT or GROUP BY.
The trouble with doing DISTINCT or GROUP BY is that the task table is still scanned for all groupid matches, then they are later reduced down to one by way of a temporary table (sometimes with filesort). I would rather it do some sort of short-circuit evaluation - do not pursue further any subsequent task matches for same identity after it has found one.
I was thinking of doing an EXISTS subquery, but I don't know how these are optimised. I'd need for it to join the task table first, before the identity table, so I am not doing a full scan of the identity table which is very large and will have a lot of non-matches.