How to split a HTML document using nokogiri?

views:

answers:

How to split a HTML document using nokogiri?

Right now, splitting the HTML document to small pieces like this: (regular expression simplified - skipping header tag content and closing tag)

document.at('body').inner_html.split(/<\s*h[2-6][^>]*>/i).collect do |fragment|
  Nokogiri::HTML(fragment)
end

Is there more easy way to perform that splitting?

The document is very simple, just headers, paragraphs and formatted text in it. For example:

<body>
<h1>Main</h1>
<h2>Sub 1</h2>
<p>Text</p>
-----
<h2>Sub 2</h2>
<p>Text</p>
-----
<h3>Sub 2.1</h3>
<p>Text</p>
-----
<h3>Sub 2.2</h3>
<p>Text</p>
</body>

For that sample, I need to get four pieces.

related questions

Best place to get Ruby on Vista up and running as dev environment

How can I encode xml files to xfdl (base64-gzip)?

What is the best way to learn Ruby?

Learning Ruby on Rails any good for Grails?

How to sell Python to a client/boss/person with lots of cash

How do I create a Class using the Singleton Design Pattern in Ruby?

How do I update Ruby Gems from behind a Proxy (ISA-NTLM)

Why Should I Learn Ruby?

How do I create a new Ruby on Rails application using MySQL instead of SQLite?

How do I rake tasks within a ruby script?

Ruby On Rails with Windows Vista - Best Setup?

Mapping values from two array in Ruby

Reverse DNS in Ruby?

Text Editor For Linux (Besides Vi)?

What is good forum software to add to an existing Rails application?

Calling Bash Commands From Ruby

How can I modify .xfdl files? (Update #1)

How do I use (n)curses in Ruby?

Open Source Ruby Projects

How do I fix 'Unprocessed view path found' error with ExceptionNotifier plugin in rails 2.1?

When to use lambda, when to use Proc.new?

Frequent SystemExit in Ruby when making HTTP calls

Implementation of "Remember me" in a Rails application.

.NET Migrations Engine

How do I add existing comments to RDoc in Ruby?

ansaurus

tags:

views:

answers:

How to split a HTML document using nokogiri?

related questions