views:

254

answers:

3

Given a list of words which contains the letters a-z at least once, how would you write a program to find the shortest pangram counted by number of characters (not counting spaces) as a combination of the words?

Since I am not sure whether short answers exist, this is not code golf, but rather just a discussion of how you would approach this. However, if you think you can manage to write a short program that would do this, then go ahead, and this might turn into code golf :)

+4  A: 

I would approach this by proving that the problem is NP-hard, and by checking heuristics for the NP-hard problems that look similar.

We can reduce a Set Cover problem to our one. Set Cover is different in that not a number of letters used is minimized, but a number of words used is minimized instead. Assume we want to solve Set Cover problem, given N words, each of length less than M. Let's build another set of words by cloning the given set, but concatenating to each of them N*M non-english letters, say, Ж. If we could build a pangram (over a,b,c...x,y,z,ж alphabet) that requires minimum symbols, that would be a pangram with minimum words, if we remove all Ж letters.

This proves that the original problem is NP-hard, but, unfortunately we need a reduction to some NP-hard problem to reuse its (hopefully already known) heuristic. Set-Cover has a greedy heuristic with logarithmic approximation, but I don't think it applies to the original problem (nature of the Set-Cover problem requires taking letter-rich, long words; it's not the way to solve our problem).

So I'd search a list of related NP-hard problems, and check if there's something of interest. That's how I'd approach this one.

Pavel Shved
I had no knowledge of the Set Cover problem before proposing this one. But anyways, even if it is NP-hard, since the universe is just 26 letters and if we add in an upper limit to the number of words in the dictionary there would be a possible solution to this. Of course, the easiest way would be brute force greedy recursion but there definitely are alternatives, like counting the number of times a letter is repeated total and working with that or something.
jonathanasdf
26 letters constrain you to "just" 67 million words. :-) This hardly means that you can elide NP-completeness. But this relinquishes requirements to the heuristics: they may be slower than fastest ones. I'm sorry I can't propose some right now; perhaps, someone else will.
Pavel Shved
I never said it would not be un-NP-complete, but that an algorithm can definitely be made under those constraints (Which would include a realistic upper limit to the number of words in the dictionary used)
jonathanasdf
Realistic constraint would be 100-200 thousand of words, the size of English dictionary. Btw, you may try the following heuristic: iteratively pick words with the highest ratio `(number of unique letters still not persisting in a pangram)/(length of the word)`.
Pavel Shved
In addition to that heuristic, it may be better to assign a value to each letter based on how often it appears in the set of input words, to decide which words to pick over other words.But yeah it's pretty much decided that this does not have a decisive solution so i'm going to just accept this answer.
jonathanasdf
@Pavel, we can solve this in `O(n)`.. I was asked this question in an interview and the interviewer guided me for more than half and hour to get to `O(n)` from brute-force.
Anurag
reading the question properly always helps. i didn't see the *combination of words* portion, ignore my solution :)
Anurag
+2  A: 

This is an variant of the set cover problem (a.k.a. hitting set problem):

As input you are given several sets. They may have some elements in common. You must select a minimum number of these sets so that the sets you have picked contain all the elements that are contained in any of the sets in the input. It was [...] shown to be NP-complete in 1972[,] and the optimization version of set cover is NP-hard.

It is a variant because we're looking for the minimum number of letters, not the minimum number of words. But I'd think it's still NP-hard, which means that you will not be able to do much better than brute force.

Thomas
@Thomas, a `O(n)` solution is possible, see http://stackoverflow.com/questions/2709477/how-would-you-write-a-program-to-find-the-shortest-pangram-in-a-list-of-words/2709904#2709904
Anurag
false alarm :), ignore it
Anurag
+1  A: 

Here's an O(n) algorithm for a different problem for when you have a string instead of a list of words as input.. It was my oversight, but will leave the solution here cause I don't feel like deleting it :)

Since we are only interested in characters, it makes the problem a lot easier. Maintain a map of each character [a-z] to its position in the string. This map alone is sufficient do determine if we have a pangram and what's its length.

1. Initialize a map of all alphabets to null
2. Initialize shortest_pangram to { length: ∞, value: undefined }
3. Loop through each "character" in given string
  3.1 Update the value of map[character] to current string index
  3.2 If we have a pangram, and its the shortest so far, record its length/value
4. shortest_pangram should have our result

The map we created is enough to determine if we have a pangram - if all values in our map are non null, we have a pangram.

To find the length of the current pangram, subtract the max value from the min value in our map. Remember that before finding the length, we must check if it is a pangram.

Here's a naive non-optimized implementation in Ruby:

class Pangram
  def initialize(string)
    @input = string.downcase.split('')
    @map = {}
    ('a'..'z').each { |c| @map[c] = nil }
    infinity = 1.0/0.0
    @best = { :length => infinity, :string => nil }
  end

  def shortest
    @input.each_with_index do |c, index|
      @map[c] = index if @map.key?(c)
      if pangram? and length < @best[:length]
        @best[:length] = length
        @best[:string] = value
      end
    end
    @best
  end

  def pangram?
    @map.values.all? { |value| !value.nil? }
  end

  def length
    @map.values.max - @map.values.min
  end

  def value
    @input[@[email protected]].join('')
  end
end

To use, instantiate the class and pass it the entire string. Call .shortest to find the length of the shortest pangram and the matching substring.

pangram = Pangram.new("..")
print pangram.shortest
Anurag
thanks for the attempt, even if you completely misunderstood the question :)
jonathanasdf