ansaurus

Question

Efficient Python Data Storage (Abstract Data Types?)

Answer 1

A:

I'm not sure what exactly you're trying to do, but maybe you're looking for something like this:

userinfo = {
  "tim": "1 infinite loop",
  "sally": "123 fake st",
  "john": "123 fake st",
  "albert": "345 real road",
  "janet": None
}

conditions = {
  "cats": ["tim"],
  "three": ["sally", "tim"],
  "three cats": ["john"],
  "four cats": ["albert"],
  "dogs": ["tim"],
  "cats hat": ["janet"]
}

for c in conditions:
  if all_words_are_in_the_sentence(c):
    for p in conditions[c]:
      print p, "because of", c
      print "additional info:", userinfo[p]

sth 2009-09-08 20:50:47

Can you provide the implementation for `all_words_are_in_the_sentence`?

S.Lott 2009-09-08 23:52:47

That implementation depends on what the OP actually want's to do there, which wasn't really clear to me from the original question. In the simplest case it could be `blah.find(c) != -1`. If everything should be based on words probably the dict keys should already be split into words like `frozenset(["cats", "hat"])`. Then set intersection could be used to compare against `set(blah.split())`. Or a function like `matches` in your answer could be used. It really depends on the actual data what the best solution is.

sth 2009-09-09 00:16:39

Answer 2

+1 A:

I haven't got the slightest clue of what you actually are trying to do, but if you have a lot of data, and you need to store it, and you need to search in it, some sort of database with indexing capabilities seems to be the way to go.

ZODB, CouchBD or SQL is a matter of taste. I seriously doubt you need to care about efficiency in disk space as much as in speed for searching and lookups anyway.

Lennart Regebro 2009-09-08 20:52:18

Hooray for CouchDB!

Jason R. Coombs 2009-09-08 21:13:58

Answer 3

+6 A:

Something like this?

class Content( object ):
    def __init__( self, content, maps_to ):
        self.content= content.split()
        self.maps_to = maps_to
    def matches( self, words ):
        return all( c in words for c in self.content )
    def __str__( self ):
        return "%s -> %r" % ( " ".join(self.content), self.maps_to )

rules = [
    Content('cats',("tim", "1 infinite loop")),
    Content('three',("sally", "123 fake st")),
    Content('three',("tim", "1 infinite loop")),
    Content('three cats',("john", "123 fake st")),
    Content('four cats',("albert", "345 real road")),
    Content('dogs',("tim", "1 infinite loop")),
    Content('cats hat', ("janet", None)),
]

blah = "There are three cats in the hat"

for r in rules:
    if r.matches(blah.split()):
        print r

Output

cats -> ('tim', '1 infinite loop')
three -> ('sally', '123 fake st')
three -> ('tim', '1 infinite loop')
three cats -> ('john', '123 fake st')
cats hat -> ('janet', None)

S.Lott 2009-09-08 20:58:44

ansaurus

tags:

views:

answers:

Efficient Python Data Storage (Abstract Data Types?)

related questions