In Episode 78 of the Joel & Jeff podcast one of the Doctype / Litmus guys states that you would never want to build a spider in ruby. Would anyone like to guess at his reasoning for this?
You wouldn't get the desired performance out of Ruby. See the referenced link: http://blog.dhananjaynene.com/2008/07/performance-comparison-c-java-python-ruby-jython-jruby-groovy/
While performance tests like these should be taken with a grain of salt, there is a considerable difference between Ruby and the top(in speed) languages.
Edit: Shame on me for answering a loaded question. All-in-all choosing a language is a series of trade offs spanning from performance to personal preferences on what you are efficient in. The beauty of programming is that all of these languages are available for you to use, so you can test what works best for the requirements of your project. My recommendation is to experiment and see what works best for you.
What OG said. In simpler terms, Ruby is dog slow and if you're looking to get a lot done per unit time, it's the wrong choice of language.
Just how fast does a crawler need to be, anyhow? It depends upon whether you're crawling the whole web on a tight schedule, or gathering data from a few dozen pages on one web site.
With Ruby and the nokogiri library, I can read this page and parse it in 0.01 seconds. Using xpath to extract data from the parsed page, I can turn all of the data into domain specific objects in 0.16 seconds. All 223 rows.
I am running into fewer and fewer problems where the traditional constraints (cpu/memory/disk) matter. This is an age of plenty. Where resources are not a constraint, don't ask "what's better for the machine." Ask "what's better for the human?"
In my opinion it's just a matter of scale. If you're writing a simple scraper for your own personal use or just something that will run on a single machine a couple of times a day, then you should choose something that involves less code/effort/maintenance pains. Whether that's ruby is a different question (I'd pick Groovy over Ruby for this task => better threading + very convenient XML parsing). If, on the other hand, you're scraping terabytes of data per day, then throughput of your application is probably more important than shorter development time.
BTW, anyone that says that you would never want to use some technology in some context or another is most probably wrong.