views:

131

answers:

1

Hey there,

I've finally had a second to look into streaming, daemons, and cron tasks and all the neat gems built around them! But I'm not clear on how/when to use these things.

I have a few questions:

1) If I wanted to have a website that stayed constantly updated, realtime, with my Facebook friends' activity feeds, up-to-the-minute Amazon book reviews on my favorite books, and my Twitter feed, would I just create some custom streaming implementation using the Daemon gem, the ruby-yali gem for streaming the content, and the Whenever gem, which could say, check those sites every 3-10 seconds to see if content I'm looking for has changed? Is that how it would work? Or is it typically/preferably done differently?

2) Is (1) too processor intensive? Is there a better way you do it, a better way for live content streaming, given that the website you want realtime updates on doesn't have a streaming api? I'm thinking about just sending a request every few seconds in a separate small ruby app (with daemons and cronjobs), getting the json/xml result, using nokogiri to remove the stuff I don't need, and then just going through the small list of comments/books/posts/etc., building a feed of what's changed, and using Juggernaut or something to push those changes to some rails app. Would that work?

I guess it all boils down to the question:

How does real-time streaming of the latest content of some website work? How do YOU do it? ...so if someone is on my site, they can see in real time the new message or new book that just came out?

Looking forward to your answers, Lance

+1  A: 

Well first, if a website that doesn't provide an API, then it's a strong indication that it's not legal to parse and extract their data, however you'd better check their terms of use and privacy policy.

Personally I'm not aware of something called "Streaming API", but supposing that they have an API , you still need to pull the results provided by it(xml, json, ....), parse them and present them back to the user. The strategy will vary depending on your app type:

  1. Desktop app: then you just can pull the data directly, parse it and provide it to the user, many apps are like that just like Twhirl.
  2. Web app: then you need to cut down the time for extracting the data. Typically you will pull the data from the API and store it. However, storing the data is a bit tricky! You don't want want your database to be a lock down for the app by the extreme pull queries that it gonna get to retrieve the data back. One way to do this is to use push methodology; follow option 2 in this case to get the data and then push to the user. If you want instant updates like chat for example you can have a look at orbited. If it's ok to save the data to some kind of user and followers' 'inboxes', then the simplest way as I can tell is to use IMAP to send the updates to the user inbox.
khelll
good point about the terms of use and privacy policy. I'm more wondering whether or not this will be useable (will be fast enough, wont use up too many resources or cost too much to keep processing this stuff every few seconds, etc.). Not interested in chat right now, more just page scraping at regular (second) intervals. Thanks for the tips.
viatropos