tags:

views:

168

answers:

3

Problem: I need to output the TOP X Contributors determined by the amount of messages posted.

Data: I have a collection of the messages posted. This is not a Database/SQL question by the sample query below just give an overview of the code.

tweetsSQL = db.GqlQuery("SELECT * FROM TweetModel ORDER BY date_created DESC")

My Model:

class TweetModel(db.Model):
# Model Definition
# Tweet Message ID is the Key Name
to_user_id = db.IntegerProperty()
to_user = db.StringProperty(multiline=False)
message = db.StringProperty(multiline=False)
date_created = db.DateTimeProperty(auto_now_add=False)
user = db.ReferenceProperty(UserModel, collection_name = 'tweets')

From examples on SO, I was able to find the TOP X Contributors by doing this:

    visits = defaultdict(int)
    for t in tweetsSQL:
        visits[t.user.from_user] += 1

Now I can then sort it using:

c = sorted(visits.iteritems(), key=operator.itemgetter(1), reverse=True)

But the only way now to retrieve the original Objects is to loop through object c, find the KeyName and then look in TweetsSQL for it to obtain the TweetModel Object.

Is there a better way?

*** Sorry I should have added that Count(*) is not available due to using google app engine

[EDIT 2]

In Summary, given a List of Messages, how do I order them by User's message Count.

IN SQL, it would be:

SELECT * FROM TweetModel GROUP BY Users ORDER BY Count(*)

But I cannot do it in SQL and need to duplicate this functionality in code. My starting point is "SELECT * FROM TweetModel"

A: 

I think your job would be a lot easier if you change the SQL query to something like:

SELECT top 100 userId FROM TweetModel GROUP BY userId ORDER BY count(*)

I wouldn't bother with the TweetModel class if you only need the data to solve the stated problem.

RossFabricant
Sorry I should have added that Count(*) is not available due to using google app engine
TimLeung
+1  A: 

Use heapq.nlargest() instead of sorted(), for efficiency; it's what it's for. I don't know the answer about the DB part of your question.

Darius Bacon
A: 

Why not invert the dictionary, once you have constructed it, so that the keys are the message counts and the values are the users? Then you can sort the keys and easily get to the users.

Vicki Laidler