views:

2198

answers:

6

Algorithm to generate all possible letter combinations of given string down to 2 letters

Trying to create an Anagram solver in AS3, such as this one found here:

http://homepage.ntlworld.com/adam.bozon/anagramsolver.htm

I'm having a problem wrapping my brain around generating all possible letter combinations for the various lengths of strings. If I was only generating permutations for a fixed length, it wouldn't be such a problem for me... but I'm looking to reduce the length of the string and obtain all the possible permutations from the original set of letters for a string with a max length smaller than the original string. For example, say I want a string length of 2, yet I have a 3 letter string of “abc”, the output would be: ab ac ba bc ca cb.

Ideally the algorithm would produce a complete list of possible combinations starting with the original string length, down to the smallest string length of 2. I have a feeling there is probably a small recursive algorithm to do this, but can't wrap my brain around it. I'm working in AS3.

Thanks!

A: 

You want a sort of arrangements. If you're familiar with the permutation algorithm then you know you have a check to see when you've generated enough numbers. Just change that limit:

I don't know AS3, but here's a pseudocode:

st = an array
Arrangements(LettersInYourWord, MinimumLettersInArrangement, k = 1)
  if ( k > MinimumLettersInArrangements )
  {
    print st;
  }

  if ( k > LettersInYourWord )
    return;      

  for ( each position i in your word that hasn't been used before )
    st[k] = YourWord[i];
    Arrangements(<same>, <same>, k + 1);

for "abc" and Arrangements(3, 2, 1); this will print:

ab
abc
ac
acb
...

If you want those with three first, and then those with two, consider this:

st = an array
Arrangements(LettersInYourWord, DesiredLettersInArrangement, k = 1)
  if ( k > DesiredLettersInArrangements )
  {
    print st;
    return
  }

  for ( each position i in your word that hasn't been used before )
    st[k] = YourWord[i];
    Arrangements(<same>, <same>, k + 1);

Then for "abc" call Arrangements(3, 3, 1); and then Arrangements(3, 2, 1);

IVlad
A: 

You can generate all words in an alphabet by finding all paths in a complete graph of the letters. You can find all paths in that graph by doing a depth first search from each letter and returning the current path at each point.

Peter Alexander
+1  A: 

For the purpose of writing an anagram solver the kind of which you linked, the algorithm that you are requesting is not necessary. It is also VERY expensive.

Let's look at a 6-letter word like MONKEY, for example. All 6 letters of the word are different, so you would create:

  • 6*5*4*3*2*1 different 6-letter words
  • 6*5*4*3*2 different 5-letter words
  • 6*5*4*3 different 4-letter words
  • 6*5*4 different 3-letter words
  • 6*5 different 2-letter words
  • For a total of 1950 words

Now, presumably you're not trying to spit out all 1950 words (e.g. 'OEYKMN') as anagrams (which they are, but most of them are also gibberish). I'm guessing you have a dictionary of legal English words, and you just want to check if any of those words are anagrams of the query word, with the option of not using all letters.

If that is the case, then the problem is simple.

To determine if 2 words are anagrams of each other, all you need to do is count how many times each letters are used, and compare these numbers!

Let's restrict ourself to only 26 letters A-Z, case insensitive. What you need to do is write a function countLetters that takes a word and returns an array of 26 numbers. The first number in the array corresponds to the count of the letter A in the word, second number corresponds to count of B, etc.

Then, two words W1 and W2 are exact anagram if countLetters(W1)[i] == countLetters(W2)[i] for every i! That is, each word uses each letter the exact same number of times!

For what I'd call sub-anagrams (MONEY is a sub-anagram of MONKEY), W1 is a sub-anagram of W2 if countLetters(W1)[i] <= countLetters(W2)[i] for every i! That is, the sub-anagram may use less of certain letters, but not more!

(note: MONKEY is also a sub-anagram of MONKEY).


This should give you a fast enough algorithm, where given a query string, all you need to do is read through the dictionary once, comparing the letter count array of each word against the letter count array of the query word. You can do some minor optimizations, but this should be good enough.

Alternatively, if you want utmost performance, you can preprocess the dictionary (which is known in advance) and create a directed acyclic graph of sub-anagram relationship.

Here's a portion of such a graph for illustration:

 D=1,G=1,O=1  ----------> D=1,O=1
  {dog,god}   \            {do,od}
               \
                \-------> G=1,O=1
                           {go}

Basically each node is a bucket for all words that have the same letter count array (i.e. they're exact anagrams). Then there's a node from N1 to N2 if N2's array is <= (as defined above) N1's array (you can perform transitive reduction to store the least amount of edges).

Then to list all sub-anagrams of a word, all you have to do is find the node corresponding to its letter count array, and recursively explore all nodes reachable from that node. All their buckets would contain the sub-anagrams.

polygenelubricants
Thank you so much! This approach works well for me.
Alan
A: 

There is simple O(N) where n is size of vocabulary. Just sort letters in each word in vocabulary or better, create binary mask of them and then compare whit letters you have.

ralu
A: 

Hi Alan, did you get this to work? would you be willing to share the code? This is something I am considering creating in AS3 too.

Dreamguts