tags:

views:

301

answers:

3

I know that there are dozens of ways to select the first child element in Nokogiri, but which is the cheapest? I can't get around using Node#children, which sounds awfully expensive. Say that there are 10000 child nodes, and I don't want to touch the 9999 others...

+1  A: 

You can try it yourself and benchmark the result.

I created a quick benchmark: http://gist.github.com/283825

$ ruby test.rb 
Rehearsal ---------------------------------------------------
xpath/first()     3.290000   0.030000   3.320000 (  3.321197)
xpath.first       3.360000   0.010000   3.370000 (  3.381171)
at                4.540000   0.020000   4.560000 (  4.564249)
at_xpath          3.420000   0.010000   3.430000 (  3.430933)
children.second   0.220000   0.010000   0.230000 (  0.233090)
----------------------------------------- total: 14.910000sec

                      user     system      total        real
xpath/first()     3.280000   0.000000   3.280000 (  3.288647)
xpath.first       3.350000   0.020000   3.370000 (  3.374778)
at                4.530000   0.040000   4.570000 (  4.580512)
at_xpath          3.410000   0.010000   3.420000 (  3.421551)
children.second   0.220000   0.010000   0.230000 (  0.226846)

From my tests, children appears to be the fastest method.

Simone Carletti
The four first approaches you did uses xpath, which is very slow.The children approach, as mentioned in the question, parses the whole parent node, which is also unacceptable.Try them out with 100 times as many nodes and 1/100 as many tests.
Styggentorsken
Thanks for showing me the benchmark library by the way... I think it might be veeeery useful in the future :-)
Styggentorsken
A: 

An approach that neither uses XPath nor results in parsing the whole parent is to use both Node#child(), Node#next_sibling() and Node#element?()

Something like this...

def first(node)
    element = node.child
    while element
       if element.element?
           return element
       else
           element = element.next
       end
    end
    nil
end
Styggentorsken
+1  A: 

Node#child is the fastest way to get the first child element.

However, if the node you're looking for is NOT the first (e.g., the 99th), then there is no faster way to select that node than to call #children and index into it.

You are correct in stating that it's expensive to build a NodeSet for all children if you only want the first one.

One limiting factor is that libxml2 (the XML library underlying Nokogiri) stores a node's children as a linked list. So you'll need to traverse the list (O(n)) to select the desired child node.

It would be feasible to write a method to simply return the nth child, without instantiating a NodeSet or even ruby objects for all the other children. My advice would be to open a feature request, at http://github.com/tenderlove/nokogiri/issues or send an email to the nokogiri mailing list.

Mike Dalessio
It has been done! Thanks :-)http://github.com/tenderlove/nokogiri/issues#issue/211
Styggentorsken