Imagine the following tables:
create table boxes( id int, name text, ...);
create table thingsinboxes( id int, box_id int, thing enum('apple,'banana','orange');
And the tables look like:
Boxes: id | name 1 | orangesOnly 2 | orangesOnly2 3 | orangesBananas 4 | misc thingsinboxes: id | box_id | thing 1 | 1 | orange 2 | 1 | orange 3 | 2 | orange 4 | 3 | orange 5 | 3 | banana 6 | 4 | orange 7 | 4 | apple 8 | 4 | banana
How do I select the boxes that contain at least one orange and nothing that isn't an orange?
How does this scale, assuming I have several hundred thousand boxes and possibly a million things in boxes?
I'd like to keep this all in SQL if possible, rather than post-processing the result set with a script.
I'm using both postgres and mysql, so subqueries are probably bad, given that mysql doesn't optimize subqueries (pre version 6, anyway).