Hello, I'm building a site with django that lets users move content around between a bunch of photo services. As you can imagine the application does a lot of api hits.
for example: user connects picasa, flickr, photobucket, and facebook to their account. Now we need to pull content from 4 different apis to keep this users data up to date.
right now I have a a function that updates each api and I run them all simultaneously via threading. (all the api's that are not enabled return false on the second line, no it's not much overhead to run them all).
Here is my question:
What is the best strategy for keeping content up to date using these APIs?
I have two ideas that might work:
Update the apis periodically (like a cron job) and whatever we have at the time is what the user gets.
benefits:
- It's easy and simple to implement.
- We'll always have pretty good data when a user loads their first page.
pitfalls:
- we have to do api hits all the time for users that are not active, which wastes a lot of bandwidth
- It will probably make the api providers unhappy
Trigger the updates when the user logs in (on a pageload)
benefits:
- we save a bunch of bandwidth and run less risk of pissing off the api providers
- doesn't require NEARLY the amount of resources on our servers
pitfalls:
- we either have to do the update asynchronously (and won't have anything on first login) or...
- the first page will take a very long time to load because we're getting all the api data (I've measured 26 seconds this way)
edit: the design is very light, the design has only two images, an external css file, and two external javascript files.
Also, the 26 seconds number comes from the firebug network monitor running on a machine which was on the same LAN as the server