I'm trying to create some code to process the twitter spritzer stream, and from the advice I'd found it seem like using ruby's eventmachine/em-http-request is a good way to do it. The basic code is below, and I'm getting ok performance, but I don't believe its able to keep up with the stream (despite running on a fairly well spec'd AWS instance).
Code:
EventMachine.run do
http = EventMachine::HttpRequest.new(url).get :head => { 'Authorization' => [ user, password ] }
buffer = ""
http.stream do |chunk|
buffer += chunk
while line = buffer.slice!(/.+\r?\n/)
handle_tweet JSON.parse(line)
end
end
http.errback {
puts "Oops"
}
end
Questions:
The handle_tween callback function is doing 3 SELECT statements, and then 3 INSERT/UPDATE queries.
These are blocking since they're using the DBI library to do the calls, does it matter that they're doing blocking within the async call?
Will multiple instances of hand_tweet be called at the same time (though only one will run at the same time)?
Lastly is there a disconnect callback for em-http-request? I saw it mentioned it the documentation for event machine, but it doesn't seem that it exists in em-http-request. I'm sure theres something for it, but maybe I missed it?