tags:

views:

40

answers:

2

Given the following database table:

WORDS
 alphagram....varchar(15)
 word.........varchar(15) PK
 length.......int

Where:

  • 'alphagram' is the letters of a word in alphabetical order (e.g. AEINNRTT is the alphagram of INTRANET)
  • the primary key is 'word', and there are indexes on alphagram and length

I've found a way to find the anagrams of a given string of letters via SQL. For example, to find the anagrams of AEINNRTT this will work:

select alphagram, word, definition
from words
where length = 8
and alphagram like '%A%' 
and alphagram like '%E%' 
and alphagram like '%I%'
and alphagram like '%NN%' 
and alphagram like '%R%' 
and alphagram like '%TT%'

That will return 1 row (for INTRANET)

And if I wanted to include a known number of wildcards, for example, how many words are with INTRANET + a blank (wildcard) I just have to change the 'length' to the total number of letters + number of wild cards

e.g.

select alphagram, word, definition
from words
where length = 9
and alphagram like '%A%' 
and alphagram like '%E%' 
and alphagram like '%I%'
and alphagram like '%NN%' 
and alphagram like '%R%' 
and alphagram like '%TT%'

...will return 8 rows (ENTERTAIN, INSTANTER, INTEGRANT, INTRANETS, ITINERANT, NATTERING, RATTENING, and TRANSIENT)

My question is this: is there a more efficient way of doing this via SQL only?

This works pretty fast in SQLServer but pretty slow in SqlLite. I realise that the %xxx% searches are not fast.

A: 

One idea is to do it like this (for a given word length):

  • split the word into individual characters (probably using SUBSTRING() in a loop, though a better approach is probably worth a separate targeted SO question)

  • generate all permutations

  • PROFIT!

Though, as a commenter said, I'd STRONGLY advise you to do that outside SQL unless you have very good reasons not to or you're just doing this to challenge your skills.

DVK
+1  A: 

You could create a kind of index column for each entry that has all the letters of the word in alphabetical order and then compare these. Each anagram will have the same index value.

Glenner003