Let's assume three models, standard joins:
class Mailbox < ActiveRecord::Base
has_many :addresses
has_many :domains, :through => :addresses
end
class Address < ActiveRecord::Base
belongs_to :mailbox
belongs_to :domain
end
class Domain < ActiveRecord::Base
has_many :addresses
has_many :mailboxes, :through => :addresses
end
Now obviously, if for any given mailbox you want to know in which domains it has addresses, you have two possible ways to go:
m = Mailbox.first
# either: SELECT DISTINCT domains.id, domains.name FROM "domains" INNER JOIN
# "addresses" ON "domains".id = "addresses".domain_id WHERE
# (("addresses".mailbox_id = 1))
m.domains.all(:select => 'DISTINCT domains.id, domains.name')
# or: SELECT domains.id, domains.name FROM "domains" INNER JOIN "addresses" ON
# "domains".id = "addresses".domain_id WHERE (("addresses".mailbox_id = 1))
# GROUP BY domains.id, domains.name
m.domains.all(:select => 'domains.id, domains.name',
:group => 'domains.id, domains.name')
The problem for me is that I don't know which solution is better. When I don't specify any other conditions, the PostgreSQL query planner favours solution number two (working as expected), but if I add conditions to the queries, it comes down to "Unique" vs. "Group":
With "DISTINCT":
Unique (cost=16.56..16.57 rows=1 width=150)
-> Sort (cost=16.56..16.56 rows=1 width=150)
Sort Key: domains.name, domains.id
-> Nested Loop (cost=0.00..16.55 rows=1 width=150)
-> Index Scan using index_addresses_on_mailbox_id on addresses (cost=0.00..8.27 rows=1 width=4)
Index Cond: (mailbox_id = 1)
-> Index Scan using domains_pkey on domains (cost=0.00..8.27 rows=1 width=150)
Index Cond: (domains.id = addresses.domain_id)
Filter: (domains.active AND domains.selfmgmt)
(9 rows)
With "GROUP BY":
Group (cost=16.56..16.57 rows=1 width=150)
-> Sort (cost=16.56..16.56 rows=1 width=150)
Sort Key: domains.name, domains.id
-> Nested Loop (cost=0.00..16.55 rows=1 width=150)
-> Index Scan using index_addresses_on_mailbox_id on addresses (cost=0.00..8.27 rows=1 width=4)
Index Cond: (mailbox_id = 1)
-> Index Scan using domains_pkey on domains (cost=0.00..8.27 rows=1 width=150)
Index Cond: (domains.id = addresses.domain_id)
Filter: (domains.active AND domains.selfmgmt)
(9 rows)
I'm really unsure how to determine the better way to retrieve these data. My instincts tell me to go with "GROUP BY", but I was unable to find any documentation specific enough to solve this problem.
Should I use ":group" or ":select => 'DISTINCT'"? Is that choice the same with other modern RDBMS like e.g. Oracle, DB2 or MySQL (I don't have access to those, so I can't perform tests)?