views:

31

answers:

2

I'm having a difficult time locating an HTML parser that works with JRuby.

I've become fond of using Nokogiri for HTML parsing, but Nokogiri requires the use of bxml2.dll, which I don't have available on my machine and am not sure that I can ensure that it is available on all users' machines.

I attempted to use another favorite, Scrubyt, but that relies on Mechanize, which also requires Nokogiri.

What Ruby HTML parser do you recommend for use with JRuby?

A: 

Why not use the pure-java version of nokogiri?

http://github.com/tenderlove/nokogiri/tree/java

cam
The requirements for this version of Nokogiri include libxml2 and libxslt. bxml2.dll is the binary file for libxml2. Know of any XML parsers that don't have any binary dependencies?
sutch
+1  A: 

THe pure java version of Nokogiri does not depend on libxml2 or any binary. See http://wiki.github.com/tenderlove/nokogiri/pure-java-nokogiri-for-jruby.

Hpricot is a popular HTML parsing library that has a pure java port as well. The functionality is similar, in fact Hpricot was the parser that popularized using CSS selectors for HTML parsing.

Mark Thomas