views:

497

answers:

4

I am trying to do full text searching in PostgreSQL 8.3. It worked splendidly, so I added in synonym matching (e.g. 'bob' == 'robert') using a synonym dictionary. That works great too. But I've noticed that it apparently only allows a word to have one synonym. That is, 'al' cannot be 'albert' and 'allen'.

Is this correct? Is there any way to have multiple dictionary matches in a PostgreSQL synonym dictionary?

For reference, here is my sample dictionary file:

bob    robert
bobby  robert
al     alan
al     albert
al     allen

And the SQL that creates the full text search config:

CREATE TEXT SEARCH DICTIONARY nickname (TEMPLATE = synonym, SYNONYMS = nickname);
CREATE TEXT SEARCH CONFIGURATION dxp_name (COPY = simple);
ALTER TEXT SEARCH CONFIGURATION dxp_name ALTER MAPPING FOR asciiword WITH nickname, simple;

What am I doing wrong? Thanks!

+1  A: 

That's a limitation in how the synonyms work. What you can do is turn it around as in:

bob    robert
bobby  robert
alan   al
albert al
allen  al

It should give the same end result, which is that a search for either one of those will match the same thing.

Magnus Hagander
Hmm. That helps, though I guess it means that there is no possible way to have a many to many relationship. For example, this is impossible to rectify:vin vincentvin vincenzovinnie vincentvinnie vincenzoThanks though!
Ryan VanMiddlesworth
A: 

In the 8.4 documentation, it talks about a replacement synonym dictionary, maybe that will be helpful?

http://www.postgresql.org/docs/8.4/interactive/dict-xsyn.html

SearchTools-Avi
A: 

Is there a public domain table of those synonyms?

David
A: 

A dictionary must define a functional relationship between words and lexemes otherwise it won't know which word to return when you lexize. In your example, al maps to three different values thus defining a multi-valued function and the lexize function doesn't know what to return. As Magnus shows, you can lexize from the proper names alan, albert, allen to the nickname al.

Remember however, that the point of an FTS dictionary is not to perform transformations per se but to allow efficient indexing on semantically relevant words. This means that the lexeme need not resemble the original entry in any linguistic sense. Although you're right that a many-to-many relationship is impossible to define, do you really need to? For example, to resolve your vin example:

vin        vin
vincent    vin
vincenzo   vin
vinnie     vin

but you could also do this:

vin        grob
vincent    grob
vincenzo   grob
vinnie     grob

and get the same effect (although why you'd want to is another story).

Thus if you were to parse a document with say 11 versions of the name Vincent then the to_tsvector function would return vin:11 in the former case and grob:11 in the latter.

gvkv