Segmentation fault in hpricot

views:

706

answers:

+1 Q:

Segmentation fault in hpricot

I'm using hpricot to read HTML. I got a segmentation fault error, I googled and some say upgrade to latest version of Ruby. I am using rails 2.3.2 and ruby 1.8.7. How to resolve this error?

Well, based on your own question, I'd say "Upgrade to the latest version of Ruby". However, I've also had problems with hpricot segfaulting, which seemed to be related to my usage of threading.

Adam Wright 2009-05-30 22:17:17

But I am using almost the latest version of ruby already. Also, I am not doing any threading in my code :(

2009-05-30 22:18:43

Alas not. Ruby latest is 1.9.1

Adam Wright 2009-05-30 22:21:06

My host is using 1.8.5Even if I upgrade to 1.9.1 on my dev machine, I wont be able to deploy the code on production

2009-05-30 22:28:23

Is there any way to catch it?

2009-05-30 22:33:11

For clarification, upgrading to 1.9 is probably not the answer. Hpricot works better on 1.8 than 1.9. Still some bugs that haven't been worked out in 1.9.

Chuck 2009-06-20 03:13:23

This appears to be an outstanding issue on the bug list. I have experienced it to. My theory is has to do with the HTML structure or bad/corrupt character in the file but I have not found where exactly.

Here are the links to the issues:

dave elkins 2009-06-20 02:35:31

+3 A:

If you're free to choose your HTML parsing library, switch it. Why, the creator of Hpricot, recently posted that you should better use Nokogiri instead of HPricot, nowadays.

You may also have a look at HTTParty.

bb 2009-07-25 11:42:31

And he also subsequently vanished from the Internet, so for the moment HPricot appears to be unmaintained.

molf 2009-08-26 17:56:07

I'm having the same segfault issue but sadly can't consult the issues Dave cited above, even via Google cache -- from what I've been googling the parse.rb segfaults have to do with encoded entities or alt character sets (accented characters perhaps)

The sanitize lib encountered the same issue and posted a monkeypatch here: http://github.com/rgrove/sanitize/blob/1e1dc9681de99e32dc166f591343dfa60fc1f648/lib/sanitize/monkeypatch/hpricot.rb

jamiew 2009-08-26 17:03:47

From memory, since I last used it about a year ago:

Hpricot stores attributes in a fixed-size buffer, and some frameworks generate outrageously long hashes in document attributes. There's some static field you can set before parsing that lets you set the size of this buffer.

I remember it being fairly prominent in the docs on the webpage, though of course it's gone now.

Ken 2009-08-26 17:25:11

+1 A:

I was trying to parse html pages with many unicode characters in them and Hpricot kept crashing. Finally, I used the monkey patch from sanitize and put it in the environment.rb for my rails application. There hasn't been a single crash since I added this patch:

http://github.com/rgrove/sanitize/blob/1e1dc9681de99e32dc166f591343dfa60fc1f648/lib/sanitize/monkeypatch/hpricot.rb

mehdi 2009-09-03 19:11:53

This worked perfect! I know I should switch to Nokogiri (and plan to), but I needed this fix for an older project!

John 2010-08-10 18:05:27

ansaurus

tags:

views:

answers:

Segmentation fault in hpricot

related questions