I am building a crawler that fetches information in parallel from a number of websites in real-time in response to a request for this information from a client. I need to request specific pages from 10-20 websites, parse their contents for specific snippets of information and return this information to the client as fast as possible. I want to do it asynchronously, so the client gets the first result displayed as soon as it is ready, while the other requests are still pending.
I have a Ruby background, and would therefore prefer to build the solution in a Ruby - however, parallelism and speed is exactly what Ruby is known NOT to excel at. I believe that libraries such as EventMachine and Typhoeus can remedy that, but I am also strongly considering node.js, because I know javascript quite well and seems to be built for this kind of thing.
Whatever I choose, I also need an efficient way to communicate the results back to the client. I am considering plain AJAX (but that would require polling the server), web sockets (but that would require fallback for older browsers) and specific solutions for persistent client/server communication such as Cramp, Juggernaut and Pusher.
Does anyone have any experience and/or recommendations they would like to share?