views:

168

answers:

1

I'm developing an app in Google App Engine. One of my methods is taking never completing, which makes me think it's caught in an infinite loop. I've stared at it, but can't figure it out.

Disclaimer: I'm using http://code.google.com/p/gaeunitlink text to run my tests. Perhaps it's acting oddly?

This is the problematic function:

def _traverseForwards(course, c_levels):
    ''' Looks forwards in the dependency graph '''
    result = {'nodes': [], 'arcs': []}

    if c_levels == 0:
        return result

    model_arc_tails_with_course = set(_getListArcTailsWithCourse(course))
    q_arc_heads = DependencyArcHead.all()

    for model_arc_head in q_arc_heads:
        for model_arc_tail in model_arc_tails_with_course:
            if model_arc_tail.key() in model_arc_head.tails:
                result['nodes'].append(model_arc_head.sink)
                result['arcs'].append(_makeArc(course, model_arc_head.sink))

                # rec_result = _traverseForwards(model_arc_head.sink, c_levels - 1)

                # _extendResult(result, rec_result)

    return result

Originally, I thought it might be a recursion error, but I commented out the recursion and the problem persists. If this function is called with c_levels = 0, it runs fine.

The models it references:

class Course(db.Model):
    dept_code = db.StringProperty()
    number = db.IntegerProperty()
    title = db.StringProperty()
    raw_pre_reqs = db.StringProperty(multiline=True)
    original_description = db.StringProperty()

    def getPreReqs(self):
        return pickle.loads(str(self.raw_pre_reqs))

    def __repr__(self):
        return "%s %s: %s" % (self.dept_code, self.number, self.title)

class DependencyArcTail(db.Model):
    ''' A list of courses that is a pre-req for something else '''
    courses = db.ListProperty(db.Key)

    def equals(self, arcTail):
        for this_course in self.courses:
            if not (this_course in arcTail.courses):
                return False

        for other_course in arcTail.courses:
            if not (other_course in self.courses):
                return False

        return True

class DependencyArcHead(db.Model):
    ''' Maintains a course, and a list of tails with that course as their sink '''
    sink = db.ReferenceProperty()
    tails = db.ListProperty(db.Key) 

Utility functions it references:

def _makeArc(source, sink):
    return {'source': source, 'sink': sink}

def _getListArcTailsWithCourse(course):
    ''' returns a LIST, not SET 
        there may be duplicate entries
    '''
    q_arc_heads = DependencyArcHead.all()
    result = []
    for arc_head in q_arc_heads:
        for key_arc_tail in arc_head.tails:
            model_arc_tail = db.get(key_arc_tail)
            if course.key() in model_arc_tail.courses:
                result.append(model_arc_tail)

    return result

Am I missing something pretty obvious here, or is GAEUnit acting up?

Also - the test that is making this run slow has no more than 5 models of any kind in the datastore. I know this is potentially slow, but my app only does this once then subsequently caches it.

+3  A: 

Ignoring the commented out recursion, I don't think this should be an infinite loop - you are just doing some for-loops over finite results sets.

However, it does seem like this would be really slow. You're looping over entire tables and then doing more datastore queries in every nested loop. It seems unlikely that this sort of request would complete in a timely manner on GAE unless your tables are really, really small.


Some rough numbers:

If H = # of entities in DepedencyArcHead and T = average # of tails in each DependencyArcHead then:

  • _getListArcTailsWithCourse is doing about H*T queries (understimate). In the "worst" case, the result returned from this function will have H*T elements.
  • _traverseForwards loops over all these results H times, and thus does another H*(H*T) queries.
  • Even if H and T are only on the order of 10s, you could be doing thousands of queries. If they're bigger, then ... (and this ignores any additional queries you'd do if you uncommented the recursive call).

In short, I think you may want to try to organize your data a little differently if possible. I'd make a specific suggestion, but what exactly you're trying to do isn't clear to me.

David Underhill