ansaurus

Question

Nokogiri: Select content between element A and B

Answer 1

+2 A:

A way-too-smart oneliner which uses recursion:

def collect_between(first, last)
  first == last ? [first] : [first, *collect_between(first.next, last)]
end

An iterative solution:

def collect_between(first, last)
  result = [first]
  until first == last
    first = first.next
    result << first
  end
  result
end

EDIT: (Short) explanation of the asterix

It's called the splat operator. It "unrolls" an array:

array = [3, 2, 1]
[4, array]  # => [4, [3, 2, 1]]
[4, *array] # => [4, 3, 2, 1]

some_method(array)  # => some_method([3, 2, 1])
some_method(*array) # => some_method(3, 2, 1)

def other_method(*array); array; end
other_method(1, 2, 3) # => [1, 2, 3]

Magnus Holm 2009-05-04 16:20:10

Thanks for your solutions and thanks for your über-smart recursive one-liner! Though, I don't understand what the '*' before the recursive call of collect_between() stands for. Could you elaborate?

Javier 2009-05-06 08:21:50

I've added a tiny explanation in my orginal answer. Google around for "splat operator" for more :-)

Magnus Holm 2009-05-07 19:58:50

Thanks! Just out of curiosity: Do you have any idea where the "splat operator" is documented? Couldn't find a word in http://www.ruby-doc.org/core/

Javier 2009-05-09 09:58:45

...and googling around for '*' definitively makes no sense. ;-)

Javier 2009-05-09 10:00:25

Answer 2

+1 A:

# monkeypatches for Nokogiri::NodeSet
# note: versions of these functions will be in Nokogiri 1.3
class Nokogiri::XML::NodeSet
  unless method_defined?(:index)
    def index(node)
      each_with_index { |member, j| return j if member == node }
    end
  end

  unless method_defined?(:slice)
    def slice(start, length)
      new_set = Nokogiri::XML::NodeSet.new(self.document)
      length.times { |offset| new_set << self[start + offset] }
      new_set
    end
  end
end

#
#  solution #1: picking elements out of node children
#  NOTE that this will also include whitespacy text nodes between the <p> elements.
#
possible_matches = parent.children
start_index = possible_matches.index(@start_element)
stop_index = possible_matches.index(@end_element)
answer_1 = possible_matches.slice(start_index, stop_index - start_index + 1)

#
#  solution #2: picking elements out of a NodeSet
#  this will only include elements, not text nodes.
#
possible_matches = value.xpath("//body/*")
start_index = possible_matches.index(@start_element)
stop_index = possible_matches.index(@end_element)
answer_2 = possible_matches.slice(start_index, stop_index - start_index + 1)

Mike Dalessio 2009-05-07 11:09:29

...i'm really looking forward to nokogiri 1.3. :)

Javier 2009-05-07 11:14:33

please note that NodeSet#slice and NodeSet#index are now in Nokogiri master on github. these will be in the 1.3.0 release later this month.

Mike Dalessio 2009-05-09 17:37:17

Answer 3

+1 A:

For the sake of completeness a XPath only solution :)
It builds an intersection of two sets, the following siblings of the start element and the preceding siblings of the end element.

Basically you can build an intersection with:
  $a[count(.|$b) = count($b)]

A little divided on variables for readability:

@start_element = "//p[@id='para-3']"
@end_element = "//p[@id='para-7']"
@set_a = "#@start_element/following-sibling::*"
@set_b = "#@end_element/preceding-sibling::*"

@my_content = value.xpath("#@set_a[ count(.|#@set_b) = count(#@set_b) ]
                         | #@start_element | #@end_element")

Siblings don't include the element itself, so the start and end elements must be included in the expression separately.

Edit: Easier solution:

@start_element = "p[@id='para-3']"
@end_element = "p[@id='para-7']"
@my_content = value.xpath("//*[preceding-sibling::#@start_element and
                               following-sibling::#@end_element]
                         | //#@start_element | //#@end_element")

andre-r 2009-09-27 21:06:10

ansaurus

tags:

views:

answers:

Nokogiri: Select content between element A and B

related questions