Hi everyone,
I know that Hpricot is still a standard but I remember hearing about a faster more expressive HTML parser for Ruby.
Does anybody know what it's called and if it is worth switching to from Hpricot??
Thanks in advance
Hi everyone,
I know that Hpricot is still a standard but I remember hearing about a faster more expressive HTML parser for Ruby.
Does anybody know what it's called and if it is worth switching to from Hpricot??
Thanks in advance
You are probably thinking about Nokogiri. I have not used it myself, but "everyone" is talking about it and the benchmarks do look interesting:
hpricot:html:doc 48.930000 3.640000 52.570000 ( 52.900035)
hpricot2:html:doc 4.500000 0.020000 4.520000 ( 4.518984)
nokogiri:html:doc 3.640000 0.130000 3.770000 ( 3.770642)
There is also Rubyful Soup
Which sells itself as a lightweight quick and dirty parser. I found the interface very intuitive and 'Ruby-ish' when using it for a project in the past, which is perhaps a little surprising given that it is a Python port.
Edit: looks like it's no longer maintained unfortunately so it's probably not the one you were looking for. Looks like Nokogiri is the on you've been hearing about.
Don't use regular expressions -- ruby's regex stuff is way too slow. Hpricot is awesome and Nokogiri looks promising, though I've not used it directly yet.