views:

87

answers:

1

The documentation for LibXML::XML::Document#find mentions that following code style needs to be used to avoid seg faults:

nodes = doc.find('/header')
nodes.each do |node|
  ... do stuff ...
end

Is this all I need to do? Below the example code box there is some commented out code:

# nodes = nil # GC.start

Do I need to include this code as well to be sure of avoiding a seg fault? I wouldn't have thought that the style shown in the first block of code would help much with reference problems. I tried it without the commented out code and have had no problems after processing a large number of files but maybe it's something that crops up under rare circumstances.

A: 

No. The commented-out code looks like the author was worried about a problem with the interaction with the GC and as the follow up mentions

When the process terminates, Ruby sometimes frees the document object before the nodes object, thereby causing a segmentation fault.

Before anyone asks, the nodes class has a mark function that tells Ruby that it is dependent on the document. The mark function works fine, and if the following two lines of code are added to the end of the test code all is well:

nodes = nil

GC.start


I wouldn't worry about it too much because:

(a) The problem refers to the library in 2008

(b) Many of us have used LibXML and apart from it being a pain to use, it is fast and stable so the author must have sorted out his problems.

If you are looking for alternatives, take a look here

Chris

Chris McCauley
I found this post http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/17694 (by the author of libxml-ruby I guess) which indicates that at some point there was a problem with not being able to predict the order in which ruby objects are destroyed at script termination. Maybe this is solved but the documentation libxml-ruby is unclear. Perhaps I should contact the author. I'm aware of the alternatives and know that some of them provide a more friendly idiomatic ruby interface. I'm interested in libxml because it's fast and fairly complete, at minimum I need schema validation and xpath.
Shane
Updated the answer to reflect that (thanks). Advice still stands, LibXML is pretty well tested but other (possibly better supported) alternatives exist.
Chris McCauley