tags:

views:

34

answers:

1

I have a prepared Nokogiri page where junk is removed... but still the text parts are stored in different nodes...

What I want to do is connecting all direct neighbour text nodes into one single text node...

what I came up with:

#merge neighbour text nodes -> connect content
def merge_text_nodes(node)
  previoustext = false
  node.children.each_with_index do |item,i|
    if item.name != 'text()'
      merge_text_nodes(item)
      previoustext = false
    else
      if previoustext
        node.children[i-1].inner_html += item.inner_html
        item.remove
      end
      previoustext = true
    end
  end
end

But it doesn't seem to work as expected - it seems to do nothing at all... Can someone tell me how to do it right/show me the error/the correct way to do it?

A: 

Okay, finally I got it right myself:

def merge_text_nodes(node)
  prev_is_text = false

  newnodes = []
  node.children.each do |element|
    if element.text?
      if prev_is_text
        newnodes[-1].content += element.text
      else
        newnodes << element
      end
      element.remove
      prev_is_text = true
    else
      newnodes << merge_text_nodes(element)
      element.remove
      prev_is_text = false
    end
  end

  node.children.remove
  newnodes.each do |item|
    node.add_child(item)
  end

  return node
end
apirogov