views:

108

answers:

2

I have an ActiveRecord called Name which contains names in various Languages.

class Name < ActiveRecord::Base
  belongs_to :language

class Language < ActiveRecord::Base
  has_many :names

Finding names in one language is easy enough:

Language.find(1).names.find(whatever)

But I need to find matching pairs where both language 1 and language 2 have the same name. In SQL, this calls for a simple self-join:

SELECT n1.id,n2.id FROM names AS n1, names AS n2
  WHERE n1.language_id=1 AND n2.language_id=2
    AND n1.normalized=n2.normalized AND n1.id != n2.id;

How can I do a query like this with ActiveRecord? Note that I need to find pairs of names (= both sides of the match), not just a list of names in language 1 that happens to match with something.

For bonus points, replace n1.normalized=n2.normalized with n1.normalized LIKE n2.normalized, since the field may contain SQL wildcards.

I'm also open to ideas about modeling the data differently, but I'd prefer to avoid having separate tables for each language if I can.

+1  A: 

It sounds like you might want to use a many-to-many relationship between Language and Name instead of has_many/belongs_to.

>> Language.create(:name => 'English')
 => #<Language id: 3, name: "English", created_at: "2010-09-04 19:15:11", updated_at: "2010-09-04 19:15:11"> 
>> Language.create(:name => 'French')
 => #<Language id: 4, name: "French", created_at: "2010-09-04 19:15:13", updated_at: "2010-09-04 19:15:13"> 
>> Language.first.names << Name.find_or_create_by_name('Dave')
 => [#<Name id: 3, name: "Dave", language_id: 3, created_at: "2010-09-04 19:16:50", updated_at: "2010-09-04 19:16:50">] 
>> Language.last.names << Name.find_or_create_by_name('Dave')
 => [#<Name id: 3, name: "Dave", language_id: 4, created_at: "2010-09-04 19:16:50", updated_at: "2010-09-04 19:16:50">]
>> Language.first.names.first.languages.map(&:name)
 => ["English", "French"] 

This extra level of normalization should make what you are trying to do easier.

Dave Pirotte
Ooh, interesting. The problem is that eg. Finnish 'Joni' and Hebrew 'Yoni' are actually different names with different properties (spelling in the original script, etc) that just happen to have the same normalized name field, not just a single name.
jpatokal
+1  A: 

Try this:

ids = [1,2]
Name.all(:select    => "names.id, n2.id AS id2",
         :joins     => "JOIN names AS n2 
                              ON n2.normalized = names.normalized AND 
                                 n2.language_id != names.language_id AND
                                 n2.language_id IN (%s)" % ids.join(','),
         :conditions => ["names.language_id IN (?)", ids]
).each do |name|
  p "id1 : #{name.id}"
  p "id2 : #{name.id2}"
end

PS: Make sure you sanitize the parameters passed to the join condition.

KandadaBoggu
Well, that certainly works (after fixing a minor typo, should be `:joins => "JOIN names as...`), but it only returns Name objects in Language 1 (with `id2` tacked on). Fetching the objects for names in Language 2 requires calling Name.find(name.id2) for every match, which causes a pretty big performance hit. Any way around this?
jpatokal
I have updated the answer, take a look
KandadaBoggu
OK, that returns a list of all matches in both languages (after adding `AND names.language_id != n2.language_id` to filter out self-matches), but it's a much slower query and it returns a single giant list instead of a list of pairs -- I still need to use Name.find(name.id2) to figure out a name's matching pair.
jpatokal
How many rows are returned in the list? Ideally, this should return one row ( assuming you do not have duplicate entries for the same key). What data are you trying to get in your second find? You can update the select list to add what ever fields you need from the `names` table. I have updated the answer. May be this time it will work.
KandadaBoggu