Hey all,
While optimizing performance of an app of mine, I ran across a huge performance bottleneck in few lines of (Python) code.
I have N tokens. each token has a value assigned to it. Some of the tokens contradict (e.g. tokens 8 and 12 cannot "live together"). My job is to find the k-best token-groups. The value of a group of tokens is simply the sum of the values of the tokens in it.
Naïve algorithm (which I have implemented...):
- find all 2^N token-group permutations of the tokens
- Eliminate the token-groups that have contradictions in them
- Calculate the value of all remaining token-groups
- Sort token-groups by value
- Choose top K token-groups
Real world numbers - I need top 10 token groups from a group of 20 tokens (for which I calculated the 1,000,000 permutations (!)), narrowed down to 3500 non-contradicting token groups. This took 5 seconds on my laptop...
I'm sure I can optimize steps 1+2 somehow by generating just the non-contradicting token-groups.
I'm also pretty sure I can somehow magically find the best token-group in a single search and find a way to traverse the token-groups by diminishing value, thus finding just the 10-best I am looking for....
my actual code:
all_possibilities = sum((list(itertools.combinations(token_list, i)) for i in xrange(len(token_list)+1)), [])
all_possibilities = [list(option) for option in all_possibilities if self._no_contradiction(option)]
all_possibilities = [(option, self._probability(option)) for option in all_possibilities]
all_possibilities.sort(key = lambda result: -result[1]) # sort by descending probability
Please help?
Tal.