views:

205

answers:

3

I'm using rake to create a Table of contents from a bunch of static HTML files.

The question is how do I insert it into all files from within rake?

I have a <ul id="toc"> in each file to aim for. The entire content of that I want to replace.

I was thinking about using Nokogiri or similar to parse the document and replace the DOM node ul#toc. However, I don't like the idea that I have to write the parser's DOM to the HTML file. What if it changes my layouts/indents etc.??

Any thoughts/ideas? Or perhaps links to working examples?

+1  A: 

You can manipulate the document directly and save the resulting output. If you confine your manipulations to a particular element, you won't alter the overall structure and should be fine.

A library like Nokogiri or Hpricot will only adjust your document if it's malformed. I know that Hpricot can be coached to have a more relaxed parsing method, or can operate in a more strict XML/XHTML manner.

Simple example:

require 'rubygems'
require 'hpricot'

document = <<END
<html>
<body>
<ul id="tag">
</ul>
<h1 class="indexed">Item 1</h1>
<h2 class="indexed">Item 1.1</h2>
<h1 class="indexed">Item 2</h1>
<h2 class="indexed">Item 2.1</h2>
<h2 class="indexed">Item 2.2</h2>
<h1>Remarks</h1>
<!-- Test Comment -->
</body>
</html>
END

parsed = Hpricot(document)

ul_tag = (parsed / 'ul#tag').first

sections = (parsed / '.indexed')

ul_tag.inner_html = sections.collect { |i| "<li>#{i.inner_html}</li>" }.to_s

puts parsed.to_html

This will yield:

<html>
<body>
<ul id="tag"><li>Item 1</li><li>Item 1.1</li><li>Item 2</li><li>Item 2.1</li><li>Item 2.2</li></ul>
<h1 class="indexed">Item 1</h1>
<h2 class="indexed">Item 1.1</h2>
<h1 class="indexed">Item 2</h1>
<h2 class="indexed">Item 2.1</h2>
<h2 class="indexed">Item 2.2</h2>
<h1>Remarks</h1>
<!-- Test Comment -->
</body>
</html>
tadman
+2  A: 

Could you rework the files to .rhtml, where

<ul id="toc">

is replaced with an erb directive, such as

<%= get_toc() %>

where get_toc() is defined in some library module. Write the transformed files as .html (to another directory if you like) and you're in business and the process is repeatable.

Or, come to that, why not just use gsub? Something like:

File.open(out_filename,'w+') do |output_file|
    output_file.puts File.read(filename).gsub(/\<ul id="toc"\>/, get_toc())
end
Mike Woodhouse
+1  A: 

I ended up with an idea similar to what Mike Woodhouse suggested. Only not using erb templates (as I wanted the source files to be freely editable also by non ruby-lovers)

  def update_toc(filename)
    raise "FATAL: Requires self.toc= ... before replacing TOC in files!" if @toc.nil?
    content = File.read(filename)
    content.gsub(/<h2 class="toc">.+?<\/ul>/, @toc)
  end

  def replace_toc_in_all_files
    @file_names.each do |name|
      content = update_toc(name)
      File.open(name, "w") do |io|
        io.write content
      end
    end
  end
Jesper Rønn-Jensen