views:

547

answers:

2

Hi,

I need to poll a web service, in this case twitter's API, and I'm wondering what the conventional wisdom is on this topic. I'm not sure whether this is important, but I've always found feedback useful in the past.

A couple scenarios I've come up with:

  1. The querying process starts every X seconds, eg a cron job runs a python script

  2. A process continually loops and queries at each iteration, eg ... well, here is where I enter unfamiliar territory. Do I just run a python script that doesn't end?

Thanks for your advice.

ps - regarding the particulars of twitter: I know that it sends emails for following and direct messages, but sometimes one might want the flexibility of parsing @replies. In those cases, I believe polling is as good as it gets.

pps - twitter limits bots to 100 requests per 60 minutes. I don't know if this also limits web scraping or rss feed reading. Anyone know how easy or hard it is to be whitelisted?

Thanks again.

A: 

You should have a page that is like a Ping or Heartbeat page. The you have another process that "tickles" or hits that page, usually you can do this in your Control Panel of your web host, or use a cron if you have a local access. Then this script can keep statistics of how often it has polled in a database or some data store and then you poll the service as often as you really need to, of course limiting it to whatever the providers limit is. You definitely don't want to (and certainly don't want to rely) on a python scrip that "doesn't end." :)

BobbyShaftoe
Thanks for replying.When you say "page" you mean a web page? I do have a server which is running a website, but I can interact with the backend python and database directly without sending data to a URL. I'm sorry if I misunderstand; I'm trying to see if I'm missing something, eg wrt efficiency.
Oh I see, I think I misunderstood you. A cron job would be ok. The only problem is with having a continually running script is you need some way to make sure it is running, in case it fails unexpectedly.
BobbyShaftoe
cool. .
+4  A: 

"Do I just run a python script that doesn't end?"

How is this unfamiliar territory?

import time
polling_interval = 36.0 # (100 requests in 3600 seconds)
running= True
while running:
    start= time.clock()
    poll_twitter()
    anything_else_that_seems_important()
    work_duration = time.clock() - start
    time.sleep( polling_interval - work_duration )

It's just a loop.

S.Lott
And now you need a cron job to make sure that this script is alive.
J.F. Sebastian
or a server monitoring application - most servers run 'continually' and no-one has issues with them. that said, they're usually more complex than this little example, so just perform the inner loop as a cron job.
gbjbaanb
Besides, how do you ensure that cron is always running? :)
gbjbaanb
If cron stops, I think that means the OS has stopped. You want this to be started (and restarted) by inittab, not cron.
S.Lott
Last time I wrote one of these, it ran 24x7 for about 5 years without failure or interruption. The "risk of failure" is microscopic.
S.Lott
word. this just seemed hackish, so i wanted to bounce it off others. thanks.
Let's see, Apache works like this. MySQL works like this. All "server" things are just big loops that run until you stop them.
S.Lott