views:

95

answers:

5

In PHP, I had this line matches = preg_grep('/^for/', array_keys($hash)); What it would do is it would grab the words: fork, form etc. that are in $hash.

In Python, I have a dict with 400,000 words. It's keys are words I'd like to present in an auto-complete like feature (the values in this case are meaningless). How would I be able to return the keys from my dictionary that match the input?

For example (as used earlier), if I have

my_dic = t{"fork" : True, "form" : True, "fold" : True, "fame" : True}

and I get some input "for", It'll return a list of "fork", "form".

+1  A: 
>>> my_dict = {"fork" : True, "form" : True, "fold" : True, "fame" : True}
>>> import re
>>> [s for s in my_dict if re.search('^for', s) is not None]
['fork', 'form']

Use of regex is more universal as you could provide more complex search patterns, if it's only about prefixes, you could use string methods: str.startwith, for example:

>>> [s for s in my_dict if s.startswith('for')]
['fork', 'form']
SilentGhost
+6  A: 
>>> mydict={"fork" : True, "form" : True, "fold" : True, "fame" : True}
>>> [k for k in mydict if k.startswith("for")]
['fork', 'form']

This should be faster than using a regular expression (and sufficient if you're just looking for word beginnings).

Tim Pietzcker
A: 

You can get the keys from my_dict with my_dict.keys(). Then, you can search through each key to see if it matches your regular expression.

m = re.compile('^for')
keys = []
for key in my_dict.keys():
   if m.match(key) != None:
      keys.append(key)
orangeoctopus
+3  A: 

So this isn't a direct answer to what you ask, but..

It seems like you don't really want a dict for this sort of thing, you're looking for a tree-like structure, right?

Then you can walk the tree for each letter that is typed (constant time), and return leaves from that subsection of the tree as the words that match that prefix.

Josh K
This particular case isn't the only time I'm using the dict. It's an inverted index so the values are a set of document IDs that are absolutely vital to what I'm doing. The reason I am using a dict is because the look up will be much faster than a tree (memory is plenty, CPU cycles are not)
tipu
Although known-key lookup will be faster with a dict than a tree structure, having to test every key for a partial match won't be - so in cases where you don't know the key in advance (such as you present, above) something a little more tree-like would be better.
pycruft
Fyi, the perfect data structure for this problem is called a **trie** - but python's stdlib doesn't have one.
THC4k
+1  A: 

If you want a specific lookup strategy (such as the "startswith 3 chars" outlined above), you could probably get a quick win by creating a specific lookup dictionary based around that idea.

q = {"fork":1, "form":2, "fold":3, "fame":4}
from collections import defaultdict
q1 = defaultdict(dict)
for k,v in q.items():
    q1[k[:3]][k]=v

This would let you do a .startswith type lookup over a much smaller set

def getChoices(frag):
    d = q1.get(frag[:3])
    if d is None:
        return []
    return [ k for k in d.keys() if k.startswith(frag) ]

Hopefully that should be a lot quicker than processing the whole 400,000 keys.

pycruft