views:

292

answers:

2

Where does Google store the logs when you do a Logging statement? Logging statements seem to be pretty fast, so it doesn't seem like they are stored in the datastore.

How reliable are the logs? If I do a logging statement and it succeeds, is it pretty much guaranteed that it will show up in the logs?

How much past history of logs is stored?

The reason I'm interested in this is because I'm making a question and answer website, and I want to keep track of views by each unique logged in user to each question, and display the view count on the question page. So if 10 different users visit the question page 100 times, it still only counts as 10 unique views.

I have an offsite computer that does background processing for my app. I'm planning to have this offsite computer download the logs about every 30 minutes, and calculate what the view count should be for each question based off of the logs. By doing this, I don't have to create a datastore entity for each different question each user views.

What do you guys think? Does anyone see any problems with this?

EDIT:I guess my main concern is the reliability of the logs.

A: 

http://code.google.com/appengine/articles/logging.html

JustJen
Yes, I've read that. I already know how to log things, and access the logs.
Kyle
telling someone to RTFM only works if the manual includes the answer to their question! which in this case, it doesn't.
Peter Recore
+3  A: 

This isn't an answer to your question - rather, it's a response to the problem you are trying to solve.

If you're familiar with Bloom Filters and using Memcached's incr (or a sharded datastore counter) you can create a solution that is "good enough". You can use a Bloom Filter to test whether a value is in the set (in this case, a User id), and if not, increment your counter and add the value to the filter. One of the properties of Bloom Filters is that adding a value to the set to be inclusion checked against is a constant time operation. Spacewise, it'll take a bit of space to store each potential filter, but this already seems to be an order of magnitude less complex than writing code to periodically grep for uniques. Here's a Python implementation.

Nothing is free, however - I said "good enough" was important. With Bloom Filters, there is always a chance of a false positive. That is, depending on the size of the hash per question, there is a small chance you will check to see if user ID has already been counted and get a "YES IT HAS" when that is the first time the User has viewed the question. You can calculate the size you need for a reasonable false positive, but there is a space tradeoff for doing so.

Ikai Lan
Thanks Ikai, I considered using the memcache and asked this question about it: http://stackoverflow.com/questions/2422131/google-app-engine-memcache-how-likely-am-i-to-lose-data-in-this-scenario . All of the responses i received said I shouldn't rely on the memcache for temporary data storage, I should only really use it as a cache. I actually came up with a pretty nice solution to use the request logs to figure out the data I need :).
Kyle
Also I tried out using the datastore ( http://stackoverflow.com/questions/2427442/google-app-engine-about-how-much-quota-does-a-single-datastore-put-use ), and figured that it was too expensive, especially considering the data I need is already stored in the request logs.
Kyle