I currently have some python code I'd like to port to C++ as it's currently slower than I'd like it to be. Problem is that I'm using a dictionary in it where the key is a tuple consisting of an object and a string (e.g. (obj, "word")). How on earth do I write something similar in C++? Maybe my algorithm is horrendous and there is some way I can make it faster without resorting to C++?
The whole algorithm below for clarity's sake. The dictionary "post_score" is the issue.
def get_best_match_best(search_text, posts):
"""
Find the best matches between a search query "search_text" and any of the
strings in "posts".
@param search_text: Query to find an appropriate match with in posts.
@type search_text: string
@param posts: List of candidates to match with target text.
@type posts: [cl_post.Post]
@return: Best matches of the candidates found in posts. The posts are ordered
according to their rank. First post in list has best match and so on.
@returntype: [cl_post.Post]
"""
from math import log
search_words = separate_words(search_text)
total_number_of_hits = {}
post_score = {}
post_size = {}
for search_word in search_words:
total_number_of_hits[search_word] = 0.0
for post in posts:
post_score[(post, search_word)] = 0.0
post_words = separate_words(post.text)
post_size[post] = len(post_words)
for post_word in post_words:
possible_match = abs(len(post_word) - len(search_word)) <= 2
if possible_match:
score = calculate_score(search_word, post_word)
post_score[(post, search_word)] += score
if score >= 1.0:
total_number_of_hits[search_word] += 1.0
log_of_number_of_posts = log(len(posts))
matches = []
for post in posts:
rank = 0.0
for search_word in search_words:
rank += post_score[(post, search_word)] * \
(log_of_number_of_posts - log(1.0 + total_number_of_hits[search_word]))
matches.append((rank / post_size[post], post))
matches.sort(reverse=True)
return [post[1] for post in matches]