For the purpose of writing an anagram solver the kind of which you linked, the algorithm that you are requesting is not necessary. It is also VERY expensive.
Let's look at a 6-letter word like MONKEY
, for example. All 6 letters of the word are different, so you would create:
- 6*5*4*3*2*1 different 6-letter words
- 6*5*4*3*2 different 5-letter words
- 6*5*4*3 different 4-letter words
- 6*5*4 different 3-letter words
- 6*5 different 2-letter words
- For a total of 1950 words
Now, presumably you're not trying to spit out all 1950 words (e.g. 'OEYKMN') as anagrams (which they are, but most of them are also gibberish). I'm guessing you have a dictionary of legal English words, and you just want to check if any of those words are anagrams of the query word, with the option of not using all letters.
If that is the case, then the problem is simple.
To determine if 2 words are anagrams of each other, all you need to do is count how many times each letters are used, and compare these numbers!
Let's restrict ourself to only 26 letters A-Z, case insensitive. What you need to do is write a function countLetters
that takes a word and returns an array of 26 numbers. The first number in the array corresponds to the count of the letter A
in the word, second number corresponds to count of B
, etc.
Then, two words W1
and W2
are exact anagram if countLetters(W1)[i] == countLetters(W2)[i]
for every i
! That is, each word uses each letter the exact same number of times!
For what I'd call sub-anagrams (MONEY
is a sub-anagram of MONKEY
), W1
is a sub-anagram of W2
if countLetters(W1)[i] <= countLetters(W2)[i]
for every i
! That is, the sub-anagram may use less of certain letters, but not more!
(note: MONKEY
is also a sub-anagram of MONKEY
).
This should give you a fast enough algorithm, where given a query string, all you need to do is read through the dictionary once, comparing the letter count array of each word against the letter count array of the query word. You can do some minor optimizations, but this should be good enough.
Alternatively, if you want utmost performance, you can preprocess the dictionary (which is known in advance) and create a directed acyclic graph of sub-anagram relationship.
Here's a portion of such a graph for illustration:
D=1,G=1,O=1 ----------> D=1,O=1
{dog,god} \ {do,od}
\
\-------> G=1,O=1
{go}
Basically each node is a bucket for all words that have the same letter count array (i.e. they're exact anagrams). Then there's a node from N1
to N2
if N2
's array is <=
(as defined above) N1
's array (you can perform transitive reduction to store the least amount of edges).
Then to list all sub-anagrams of a word, all you have to do is find the node corresponding to its letter count array, and recursively explore all nodes reachable from that node. All their buckets would contain the sub-anagrams.