views:

320

answers:

4

What I'm looking for essentially is this SQL translated into Google AppEngine (for Java) terms:

select count(*) from Customers

Seems simple enough, but from reading the documentation, it seems like I would have to run a query that matches all Customers, loop though it and count the results, taking paging into account. I do not want to retrieve each and every element, I just want to count them.

Or another way, there was an API to loop over all entries of a given type (can't find the exact API at the moment). This seems to be quite inefficient, not to mention that datastore calls come with a limited quota as well.

Any hints would be appreciated.

Thanks, Mark

A: 

Unfortunately, it's impossible for BigTable to count entities without running queries to match all of them. Keeping in mind that applications like Google Search and Google Reader won't even give you exact counts for results when you have more than 1000, if you absolutely, positively, think you need to count all of your entities, you could do a series of keys_only queries limited to 1000 entities each and add up the counts for all of them.

Wooble
A: 

This is just speculation, but I think they will implement a count() method in java similar to their python implementation. HERE is the count() method for python.

count(limit)

Returns the number of results this query fetches.

count() is somewhat faster than retrieving all of the data by a constant factor, but the running time still grows with the size of the result set. It's best to only use count() in cases where the count is expected to be small, or specify a limit.

Note: count() returns a maximum of 1000. If the actual number of entities that match the query criteria exceeds the maximum, count() returns a count of 1000.

Arguments:

limit

    The maximum number of results to count.
mcotton
will this return an accurate count when there are > 1000 results?
Paul Tarjan
+3  A: 

As wooble says, bigtable doesn't support row counts as a fundamental concept -- you can write a wrapper function, as mcotton says, but, as he quotes from the docs, that will still be limited to 1000 at most.

To overcome these limits you'll need to keep, for each kind of entity you want to count, a counter that gets incremented everytime a new entity of that kind is put, decremented when an entity of that kind is deleted.

To keep your app highly scalable you'll probably want to shard such counters, see http://code.google.com/appengine/articles/sharding_counters.html (unfortunately I'm not aware of a translation of that recipe to Java, but the concepts should be the same).

Alex Martelli
A: 

As mcotton said, it appears that count() on a "SELECT __ key __" query with no limit may do what you want.

http://code.google.com/appengine/docs/python/datastore/queryclass.html#Query_count

This is a relatively new feature in Google Datastore though. They used to have a required limit of 1000 on this. They only recently removed that limit. The only limit now is whether your query executes quickly enough to not time out.

There's also the new Google Mapper API you could consider if this is a truly huge amount of data and you do hit timeouts. To read more on that, do a Google search for [appengine mapreduce].

I agree that it is pretty amazing that GQL doesn't support "SELECT COUNT(*)". That seems like a bit of an oversight. But doing a select only on the key and then using count() to not send those keys all the way back to the app should behave similarly.

Greg Linden