views:

55

answers:

4

I am working on an app which is to suggest alternative words/phrases for input text. I have doubts about what might be a good design for the synonyms table.

Design considerations:

  1. number of synonyms is variable, i.e. football has one synonym (soccer), but in particular has two (particularly, specifically)
  2. if football is a synonym to soccer, the relation exists in the opposite direction as well.
  3. our goal is to query a word and find its synonyms
  4. we want to keep the table small and make adding new words easy

What comes to my mind is a two column design with

  • col a = word and
  • col b = delimited list of synonyms

Is there any better alternative? What about using two tables, one for words and the other for relations?

Edit:
After reading the answers, another approach came to my mind. It's using a single column table with all synonym words, each wrapped in word boundary marker. With that i mean something like
|in particular|particularly|specifically|
Then I'd query the table with
SELECT * FROM synonyms WHERE word LIKE '%|specifically|%'
And in code I trash the preceding and trailing |s and do a split and have the synonyms.

Anything bad in that, that I'm not seeing?

+3  A: 
  • col a = word and
  • col b = delimited list of synonyms

Terrible idea. Each time you add a synonym you have to change it in two places. Just add pairs

(in particular, particularly)
(in particular, specifically)

and enforce that in (a,b) and a < b. That way, there'll be no redundancy.

Using two tables is fine too, and maybe you'll save memory. However, you'll have one more join in your queries.

Artefacto
+1  A: 

I would use the two tables approach, one for words and the other for relations over the single table approach for three reasons.

  1. There will be no duplicate words in the words table.
  2. It's easier to enforce the two-way relation between a word and its synonym.
  3. It's more tedious to write SQL statements that work on delimited lists.

Word table: 2 columns Id, Word

Relation table: 2 column WordId1, WordId2

Words with two synomyms will have one row in the Word table and two rows in the Relation table.

Mr Roys
+1  A: 

Your one table design would have a lot of duplicate synonym lists, but that might be fine for you.

You may want to consider a two table design, mapping all words to a "canonical variation" (as a word) or id (a number):

syn1 -> 0x1234eef3
syn2 -> 0x1234eef3

Then a table mapping id to list of synonyms:

01234eef3 -> (syn1, syn2)
Stephen
A: 

After reading the answers, another approach came to my mind. It's using a single column table with all synonym words, each wrapped in word boundary marker. With that i mean something like |in particular|particularly|specifically| Then I'd query the table with SELECT * FROM synonyms WHERE word LIKE '%|specifically|%' And in code I trash the preceding and trailing |s and do a split and have the synonyms.

Majid