A naïve solution which solves the problem and is general enough for any application you might have is this:
def combinations(words, length):
if length == 0:
return []
result = [[word] for word in words]
while length > 1:
new_result = []
for combo in result:
new_result.extend(combo + [word] for word in words)
result = new_result[:]
length -= 1
return result
Basically, this gradually builds up a tree in memory of all the combinations, and then returns them. It is memory-intensive, however, and so is impractical for large-scale combinations.
Another solution for the problem is, indeed, to use counting, but then to transform the numbers generated into a list of words from the wordlist. To do so, we first need a function (called number_to_list()
):
def number_to_list(number, words):
list_out = []
while number:
list_out = [number % len(words)] + list_out
number = number // len(words)
return [words[n] for n in list_out]
This is, in fact, a system for converting decimal numbers to other bases. We then write the counting function; this is relatively simple, and will make up the core of the application:
def combinations(words, length):
numbers = xrange(len(words)**length)
for number in numbers:
combo = number_to_list(number, words)
if len(combo) < length:
combo = [words[0]] * (length - len(combo)) + combo
yield combo
This is a Python generator; making it a generator allows it to use up less RAM. There is a little work to be done after turning the number into a list of words; this is because these lists will need padding so that they are at the requested length. It would be used like this:
>>> list(combinations('01', 3))
[['0', '0', '0'], ['0', '0', '1'],
['0', '1', '0'], ['0', '1', '1'],
['1', '0', '0'], ['1', '0', '1'],
['1', '1', '0'], ['1', '1', '1']]
As you can see, you get back a list of lists. Each of these sub-lists contains a sequence of the original words; you might then do something like map(''.join, list(combinations('01', 3)))
to retrieve the following result:
['000', '001', '010', '011', '100', '101', '110', '111']
You could then write this to disk; a better idea, however, would be to use the built-in optimizations that generators have and do something like this:
fileout = open('filename.txt', 'w')
fileout.writelines(
''.join(combo) for combo in combinations('01', 3))
fileout.close()
This will only use as much RAM as necessary (enough to store one combination). I hope this helps.