views:

84

answers:

3

I love Ruby and its framework, but I don't think that Ruby On Rails is the best choise to develop a Feed-parser and Indexer.

Maybe Python or Java are better choises. What language do you suggest?

+1  A: 

A Feed (RSS?) is usually pretty well structured (compared to a regular web page, at least). Check out Web Harvest, a Java / bean shell-based DOM parser (among other things). You can use this to automate grabbing data off the internet. There is a domain-specific language (defined in XML) that you'll have to learn. It's learning curve might be a bit steep, but I felt that it's well worth the effort.

A: 

I am not very familiar with Java, but I can say Python is very well suited for the job.

There is this very fast XML parser module called BeautifulStoneSoup, which you can use. It is part of the BeautifulSoup library. And if you're only looking for a simple indexer, Python has an sqlite engine builtin which is also lightweight and very fast.

Sahasranaman MS
+1  A: 

I think Ruby is just fine for any of these kind of tasks:

If you are comfortable with Ruby I see no reason to shell out to Java, Python et el. for most tasks. Keep in mind lots of the Ruby libraries sit on native implementations.

Sam Saffron