ansaurus

Question

How would you write a program to find the shortest pangram in a list of words?

Answer 1

+4 A:

I would approach this by proving that the problem is NP-hard, and by checking heuristics for the NP-hard problems that look similar.

We can reduce a Set Cover problem to our one. Set Cover is different in that not a number of letters used is minimized, but a number of words used is minimized instead. Assume we want to solve Set Cover problem, given N words, each of length less than M. Let's build another set of words by cloning the given set, but concatenating to each of them N*M non-english letters, say, Ж. If we could build a pangram (over a,b,c...x,y,z,ж alphabet) that requires minimum symbols, that would be a pangram with minimum words, if we remove all Ж letters.

This proves that the original problem is NP-hard, but, unfortunately we need a reduction to some NP-hard problem to reuse its (hopefully already known) heuristic. Set-Cover has a greedy heuristic with logarithmic approximation, but I don't think it applies to the original problem (nature of the Set-Cover problem requires taking letter-rich, long words; it's not the way to solve our problem).

So I'd search a list of related NP-hard problems, and check if there's something of interest. That's how I'd approach this one.

Pavel Shved 2010-04-25 19:30:20

I had no knowledge of the Set Cover problem before proposing this one. But anyways, even if it is NP-hard, since the universe is just 26 letters and if we add in an upper limit to the number of words in the dictionary there would be a possible solution to this. Of course, the easiest way would be brute force greedy recursion but there definitely are alternatives, like counting the number of times a letter is repeated total and working with that or something.

jonathanasdf 2010-04-25 19:53:25

26 letters constrain you to "just" 67 million words. :-) This hardly means that you can elide NP-completeness. But this relinquishes requirements to the heuristics: they may be slower than fastest ones. I'm sorry I can't propose some right now; perhaps, someone else will.

Pavel Shved 2010-04-25 19:59:00

I never said it would not be un-NP-complete, but that an algorithm can definitely be made under those constraints (Which would include a realistic upper limit to the number of words in the dictionary used)

jonathanasdf 2010-04-25 20:01:48

Realistic constraint would be 100-200 thousand of words, the size of English dictionary. Btw, you may try the following heuristic: iteratively pick words with the highest ratio `(number of unique letters still not persisting in a pangram)/(length of the word)`.

Pavel Shved 2010-04-25 20:05:02

In addition to that heuristic, it may be better to assign a value to each letter based on how often it appears in the set of input words, to decide which words to pick over other words.But yeah it's pretty much decided that this does not have a decisive solution so i'm going to just accept this answer.

jonathanasdf 2010-04-25 20:11:42

@Pavel, we can solve this in `O(n)`.. I was asked this question in an interview and the interviewer guided me for more than half and hour to get to `O(n)` from brute-force.

Anurag 2010-04-25 21:00:15

reading the question properly always helps. i didn't see the *combination of words* portion, ignore my solution :)

Anurag 2010-04-25 21:55:49

Answer 2

+2 A:

This is an variant of the set cover problem (a.k.a. hitting set problem):

As input you are given several sets. They may have some elements in common. You must select a minimum number of these sets so that the sets you have picked contain all the elements that are contained in any of the sets in the input. It was [...] shown to be NP-complete in 1972[,] and the optimization version of set cover is NP-hard.

It is a variant because we're looking for the minimum number of letters, not the minimum number of words. But I'd think it's still NP-hard, which means that you will not be able to do much better than brute force.

Thomas 2010-04-25 19:30:44

@Thomas, a `O(n)` solution is possible, see http://stackoverflow.com/questions/2709477/how-would-you-write-a-program-to-find-the-shortest-pangram-in-a-list-of-words/2709904#2709904

Anurag 2010-04-25 21:01:09

false alarm :), ignore it

Anurag 2010-04-25 21:56:10

Answer 3

+1 A:

Here's an O(n) algorithm for a different problem for when you have a string instead of a list of words as input.. It was my oversight, but will leave the solution here cause I don't feel like deleting it :)

Since we are only interested in characters, it makes the problem a lot easier. Maintain a map of each character [a-z] to its position in the string. This map alone is sufficient do determine if we have a pangram and what's its length.

1. Initialize a map of all alphabets to null
2. Initialize shortest_pangram to { length: ∞, value: undefined }
3. Loop through each "character" in given string
  3.1 Update the value of map[character] to current string index
  3.2 If we have a pangram, and its the shortest so far, record its length/value
4. shortest_pangram should have our result

The map we created is enough to determine if we have a pangram - if all values in our map are non null, we have a pangram.

To find the length of the current pangram, subtract the max value from the min value in our map. Remember that before finding the length, we must check if it is a pangram.

Here's a naive non-optimized implementation in Ruby:

class Pangram
  def initialize(string)
    @input = string.downcase.split('')
    @map = {}
    ('a'..'z').each { |c| @map[c] = nil }
    infinity = 1.0/0.0
    @best = { :length => infinity, :string => nil }
  end

  def shortest
    @input.each_with_index do |c, index|
      @map[c] = index if @map.key?(c)
      if pangram? and length < @best[:length]
        @best[:length] = length
        @best[:string] = value
      end
    end
    @best
  end

  def pangram?
    @map.values.all? { |value| !value.nil? }
  end

  def length
    @map.values.max - @map.values.min
  end

  def value
    @input[@[email protected]].join('')
  end
end

To use, instantiate the class and pass it the entire string. Call .shortest to find the length of the shortest pangram and the matching substring.

pangram = Pangram.new("..")
print pangram.shortest

Anurag 2010-04-25 20:48:13

thanks for the attempt, even if you completely misunderstood the question :)

jonathanasdf 2010-04-25 23:47:23

ansaurus

tags:

views:

answers:

How would you write a program to find the shortest pangram in a list of words?

related questions