tags:

views:

62

answers:

2

Is there a CPAN module or code snippet that I can use to modify local HTML files without using a regExp?

What I want to do :

  1. Change the start tag ( example : <div> to <div id="newtag"> )
  2. Add a tag before another ( example : </head> to <script type="text/javascript"> ...</script></head>
  3. Remove tags
  4. Read the content of a given tag. (<- ok this can be done with an XML / HTML parser.
+1  A: 

CPAN

A simple CPAN search returns

XPATH

It sounds like you are not familiar with XPath. Here is a quick tutorial to get you familiar. Its not Perl but it will explain the concepts.

Shiftbit
+5  A: 

If you have HTML, and not XHTML, then you don't want to be using an XML parser.

HTML::Parser is the standard HTML parser for Perl. Pretty much everything else is built on top of it.

HTML::TokeParser is an alternative interface to HTML::Parser. It returns things on demand instead of passing everything to callbacks.

HTML::TreeBuilder builds a DOM-like tree from the HTML, which you can then modify.

HTML::TreeBuilder::XPath extends HTML::TreeBuilder with XPath support.

HTML::Query extends HTML::TreeBuilder with jQuery-like selectors.

pQuery is another module that brings more complete jQuery compatibility to HTML::TreeBuilder.

cjm