I'm trying to figure out if there's a reasonably efficient way to perform a lookup in a dictionary (or a hash, or a map, or whatever your favorite language calls it) where the keys are regular expressions and strings are looked up against the set of keys. For example (in Python syntax):
>>> regex_dict = { re.compile(r'foo.') : 12, re.compile(r'^FileN.*$') : 35 }
>>> regex_dict['food']
12
>>> regex_dict['foot in my mouth']
12
>>> regex_dict['FileNotFoundException: file.x does not exist']
35
(Obviously the above example won't work as written in Python, but that's the sort of thing I'd like to be able to do.)
I can think of a naive way to implement this, in which I iterate over all of the keys in the dictionary and try to match the passed in string against them, but then I lose the O(1) lookup time of a hash map and instead have O(n), where n is the number of keys in my dictionary. This is potentially a big deal, as I expect this dictionary to grow very large, and I will need to search it over and over again (actually I'll need to iterate over it for every line I read in a text file, and the files can be hundreds of megabytes in size).
Is there a way to accomplish this, without resorting to O(n) efficiency?
Alternatively, if you know of a way to accomplish this sort of a lookup in a database, that would be great, too.
(Any programming language is fine -- I'm using Python, but I'm more interested in the data structures and algorithms here.)
Someone pointed out that more than one match is possible, and that's absolutely correct. Ideally in this situation I'd like to return a list or tuple containing all of the matches. I'd settle for the first match, though.
I can't see O(1) being possible in that scenario; I'd settle for anything less than O(n), though. Also, the underlying data structure could be anything, but the basic behavior I'd like is what I've written above: lookup a string, and return the value(s) that match the regular expression keys.