I'm guessing this is something like finding possible words given a set of Scrabble tiles, so that a character can be repeated only as many times as it is repeated in the original list.
The trick is to efficiently test each character of each word in your word file against a set containing your source letters. For each character, if found in the test set, remove it from the test set and proceed; otherwise, the word is not a match, and go on to the next word.
Python has a nice function all
for testing a set of conditions based on elements in a sequence. all
has the added feature that it will "short-circuit", that is, as soon as one item fails the condition, then no more tests are done. So if your first letter of your candidate word is 'z', and there is no 'z' in your source letters, then there is no point in testing any more letters in the candidate word.
My first shot at writing this was simply:
matches = []
for word in wordlist:
testset = set(letters)
if all(c in testset for c in word):
matches.append(word)
Unfortunately, the bug here is that if the source letters contained a single 'm', a word with several 'm's would erroneously match, since each 'm' would separately match the given 'm' in the source testset. So I needed to remove each letter as it was matched.
I took advantage of the fact that set.remove(item)
returns None, which Python treats as a Boolean False
, and expanded my generator expression used in calling all
. For each c in word, if it is found in testset, I want to additionally remove it from testset, something like (pseudo-code, not valid Python):
all(c in testset and "remove c from testset" for c in word)
Since set.remove returns a None, I can replace the quoted bit above with "not testset.remove(c)", and now I have a valid Python expression:
all(c in testset and not testset.remove(c) for c in word)
Now we just need to wrap that in a loop that checks each word in the list (be sure to build a fresh testset before checking each word, since our all
test has now become a destructive test):
for word in wordlist:
testset = set(letters)
if all(c in testset and not testset.remove(c) for c in word):
matches.append(word)
The final step is to sort the matches by descending length. We can pass a key function to sort. The builtin len
would be good, but that would sort by ascending length. To change it to a descending sort, we use a lambda to give us not len
, but -1 * len
:
matches.sort(key=lambda wd: -len(wd))
Now you can just print out the longest word, at matches[0], or iterate over all matches and print them out.
(I was surprised that this brute force approach runs so well. I used the 2of12inf.txt word list, containing over 80,000 words, and for a list of 10 characters, I get back the list of matches in about 0.8 seconds on my little 1.99GHz laptop.)