views:

320

answers:

3

Pardon the ambiguity in the title- I wasn't quite sure how to phrase my question.

Given a string:

blah = "There are three cats in the hat"

and the (I'm not quite sure which data structure to use for this) "userInfo":

cats -> ("tim", "1 infinite loop")
three -> ("sally", "123 fake st")
three -> ("tim", "1 infinite loop")
three cats -> ("john", "123 fake st")
four cats -> ("albert", "345 real road")
dogs -> ("tim", "1 infinite loop")
cats hat -> ("janet", NULL)

The proper output should be:

tim (since 'cats' exists)
sally (since 'three' exists)
tim (since 'three' exists)
john (since both 'three' and 'cats' exist)
janet (since both 'cats' and 'hat' exist somewhere in the string blah)

I want an efficient way of storing this data. There is a possibility for multiple 'three' strings that can be matched (i.e., 150 people will have that string.) Should I just have a list with all this data and duplicate the "keys"?

A: 

I'm not sure what exactly you're trying to do, but maybe you're looking for something like this:

userinfo = {
  "tim": "1 infinite loop",
  "sally": "123 fake st",
  "john": "123 fake st",
  "albert": "345 real road",
  "janet": None
}

conditions = {
  "cats": ["tim"],
  "three": ["sally", "tim"],
  "three cats": ["john"],
  "four cats": ["albert"],
  "dogs": ["tim"],
  "cats hat": ["janet"]
}

for c in conditions:
  if all_words_are_in_the_sentence(c):
    for p in conditions[c]:
      print p, "because of", c
      print "additional info:", userinfo[p]
sth
Can you provide the implementation for `all_words_are_in_the_sentence`?
S.Lott
That implementation depends on what the OP actually want's to do there, which wasn't really clear to me from the original question. In the simplest case it could be `blah.find(c) != -1`. If everything should be based on words probably the dict keys should already be split into words like `frozenset(["cats", "hat"])`. Then set intersection could be used to compare against `set(blah.split())`. Or a function like `matches` in your answer could be used. It really depends on the actual data what the best solution is.
sth
+1  A: 

I haven't got the slightest clue of what you actually are trying to do, but if you have a lot of data, and you need to store it, and you need to search in it, some sort of database with indexing capabilities seems to be the way to go.

ZODB, CouchBD or SQL is a matter of taste. I seriously doubt you need to care about efficiency in disk space as much as in speed for searching and lookups anyway.

Lennart Regebro
Hooray for CouchDB!
Jason R. Coombs
+6  A: 

Something like this?

class Content( object ):
    def __init__( self, content, maps_to ):
        self.content= content.split()
        self.maps_to = maps_to
    def matches( self, words ):
        return all( c in words for c in self.content )
    def __str__( self ):
        return "%s -> %r" % ( " ".join(self.content), self.maps_to )

rules = [
    Content('cats',("tim", "1 infinite loop")),
    Content('three',("sally", "123 fake st")),
    Content('three',("tim", "1 infinite loop")),
    Content('three cats',("john", "123 fake st")),
    Content('four cats',("albert", "345 real road")),
    Content('dogs',("tim", "1 infinite loop")),
    Content('cats hat', ("janet", None)),
]

blah = "There are three cats in the hat"

for r in rules:
    if r.matches(blah.split()):
        print r

Output

cats -> ('tim', '1 infinite loop')
three -> ('sally', '123 fake st')
three -> ('tim', '1 infinite loop')
three cats -> ('john', '123 fake st')
cats hat -> ('janet', None)
S.Lott