ansaurus

Question

How do I implement a dictionary "with a Python tuple" as key in C++?

Answer 1

+3 A:

map<pair<..., string>, ...> if you're hellbent on using C++ for this.

Ignacio Vazquez-Abrams 2010-06-16 14:13:08

If MdaG was truly hellbent, I think he'd have used caps lock.

Ken 2010-06-16 15:09:55

Thanks, pair was what I was missing. :-)And I'm not hellbent, if I can use Cython or a better algorithm that's even better. :-)

MdaG 2010-06-16 20:33:49

Answer 2

+2 A:

for once, you're calling separate_words(post.text) for every search_word in search_words. You should call separate_words only once for each post in posts.

That is, rather than:

for search_word in search_words:
    for post in posts:
        # do heavy work

you should instead have:

for post in posts:
    # do the heavy works
    for search_word in search_words:
        ...

If, as I suspected, that separate_words do a lot of string manipulations, don't forget that string manipulations is relatively expensive in python since string is immutable.

Another improvement you can do, is that you don't have to compare every word in search_words with every word in post_words. If you keep the search_words and post_words array sorted by word length, then you can use a sliding window technique. Basically, since search_word will only match a post_word if the difference in their length is less than 2, then you need only to check among the window of two lengths differences, thereby cutting down the number of words to check, e.g.:

search_words = sorted(search_words, key=len)
g_post_words = collections.defaultdict(list) # this can probably use list of list
for post_word in post_words:
    g_post_words[len(post_word)].append(post_word)

for search_word in search_words:
    l = len(search_word)
    # candidates = itertools.chain.from_iterable(g_post_words.get(m, []) for m in range(l - 2, l + 3))
    candidates = itertools.chain(g_post_words.get(l - 2, []), 
                                 g_post_words.get(l - 1, []), 
                                 g_post_words.get(l    , []),
                                 g_post_words.get(l + 1, []),
                                 g_post_words.get(l + 2, [])
                                )
    for post_word in candidates:
        score = calculate_score(search_word, post_word)
        # ... and the rest ...

(this code probably won't work as is, it's just to illustrate the idea)

Lie Ryan 2010-06-16 15:03:49

This is valuable input and you're correct. I'm not familiar with itertools, but this is a good time to read up on it. Thank you. :)

MdaG 2010-06-16 20:31:50

ansaurus

tags:

views:

answers:

How do I implement a dictionary "with a Python tuple" as key in C++?

related questions