views:

256

answers:

3

Pangram is a sentence using every letter of the alphabet at least once.

Is it possible to generate shortest Pangram from given word list?

Lets say, I have word list like this

cat monkey temp banana christmas 
fast quick quickest jumping 
white brown black blue
fox xor jump jumps oven over 
now the is was 
lazy laziest crazy
dig dog joker mighty

And Like to generate possible pangrams list like followings

the quick over lazy jumps fox dog brown
brown dog fox jumps lazy over quick the
quick brown fox jumps over the lazy dog

Grammar and word ordering is no need to consider for now (I am going to do it in non-english language)

Any Ideas, algorithms, codes, references, will be greatly appreciated!

PS: This is not a homework

+1  A: 

Ideas for finding an approximate solutions:

  1. determine the letter frequency of your set
  2. score each word
  3. add words with the highest score until you have every letter

Word scoring could look something like:

score = 0
foreach unique letter in word
  score += 1/letter_frequency[letter]
score /= word.length
luke
+2  A: 

Sure. Here's one algorithm:

  1. Let Lw be the list of words given.
  2. Let Ld be the list of distinct words in Lw.
  3. Let Lc be the list of all possible combinations using words from Ld. If Ld contains n elements, Lc will contain 2n elements.
  4. Let P be the shortest Pangram (desired result). Initially P will be empty.
  5. Iterate over each item (combination) in Lc. In each iteration:
    1. Let C be the current combination being considered.
    2. Check if C is a Pangram.
      1. If C is a Pangram, check if P is empty or if C is shorter than P.
        1. If P is empty or if C is shorter than P, let P be C
Shaunak Kashyap
@Shaunak Kashyap, Thanks a lot
S.Mark
+3  A: 

The simplest way to generate all possible pangrams from the word list is probably to generate all possible combinations of words from the list, then for each of them, check whether it's a pangram. To do the check, walk through the string and set a bool to true for each letter that's in the string. At the end, it's a pangram iff the bools have all been set to true.

A more efficient method would probably be to walk through each word, and set up an array of bools (or a set of bits, such as in a 32-bit int) along with the length of the word. You can then find the bits that or'd together produce a value with all 26 bits set, and you have a pangram.

As you're putting a pangram together, you can add bounds-check, so if adding a word would make a potential pangram longer than your current shortest pangram (if any) you stop that check right there. If you start by sorting your words by length, the minute you hit a longer combination, you can quit that whole set of attempts, and go on to the next possibility.

If you want to get even more sophisticated about it, You can start by building the same kind of bit set as above. Then take those, and add together the bits to determine which letters occur in the fewest words. When you start to generate a potential pangram, you know it must include one of those words. E.g. in the list you gave above, "lazy", "laziest" and "crazy" seem to be the only ones that include 'z', so you know immediately that every pangram must include one of those three words. None of those includes a "q", and the only words that do include "q" seem to be "quick", and "quickest", so (again) every pangram must include one of those two (of course I'm going from manual inspection here, so I might have missed a word). So, every possible pangram from that list includes (and might as well start with): (quick|quickest) (lazy|laziest|crazy).

You could also consider preprocessing your word list: any word that's longer than another, but doesn't contain at least one letter missing from the other can be eliminated immediately. As a hypothetical example, if you have "ab" and "abab", you know that "abab" can never result in a shorter pangram than "ab", so you might as well eliminate it from the list immediately.

Jerry Coffin
Thanks, I think this is the way to go, (still stucking in performance issue with combinations though)
S.Mark