I am trying to rework many pages across many sites. The pages may contain JavaScript, PHP, or ASP code in addition to HTML. The problem I'm encountering is that the module rewrites things I don't want rewritten. I've managed to handle most of the symbols (e.g., "
, >
) in HTML tags like script
, but they get changed into entities (e.g., "
, >
) in the php sections. Plus, the php tags are stripped out at the same time.
If I have a PHP file that looks like this:
<html>
<head><title>My Page</title></head>
<body>
<p>Some cruft which I want to repeat</p>
<form name="foo"> (form content to be replaced)
</form>
<script type="JavaScript">
<!--
Some javaScript to be left alone
-->
</script>
<a href="somepage.php">Link to be removed</a>
<?php
if (strlen($txtKeyword) > 2)
{
echo " or <a href=\"database_search_keyword.htm\">Search again?</a></p>";
if(isset($_REQUEST['nr']))
{
$numRows = $_REQUEST['nr'];
....
?>
</body>
</html>
I want the final result to look like:
<html>
<head><title>My Page</title></head>
<body>
<p>Some cruft which I want to repeat</p>
<ul><li>List replacing form</li>
</ul>
<script type="JavaScript">
<!--
Some javaScript to be left alone
-->
</script>
<?php
if (strlen($txtKeyword) > 2)
{
echo " or <a href=\"database_search_keyword.htm\">Search again?</a></p>";
if(isset($_REQUEST['nr']))
{
$numRows = $_REQUEST['nr'];
....
?>
</body>
</html>
As I said, I'm able to get everything working except the php. It gets managled, so the result
<html>
<head><title>My Page</title></head>
<body>
<p>Some cruft which I want to repeat</p>
<ul><li>List replacing form</li>
</ul>
<script type="JavaScript">
<!--
Some javaScript to be left alone
-->
</script>
<?php
if (strlen($txtKeyword) > 2)
{
echo " or ";
if(isset($_REQUEST['nr']))
{
$numRows = $_REQUEST['nr'];
....
?>
</body>
</html>
I have been working with HTML::TreeBuilder 3.23. I've tried the developer release 3.23_3, but it gives an error message due to php code (e.g., a has an invalid attribute name '"§ion_id' ' . $section_id . '
).
Example code for what I've done so far (with the filesystem walking, etc. chopped out) is
#!/usr/bin/perl -w
use strict;
use HTML::TreeBuilder;
# Set up replacement forms
my $artistSearch = HTML::Element->new ('~literal', 'text', <<EOF);
<p>Please select from the list below.</p>
<ul>
<li><a href="http://firstlink.com/">item 1</a></li>
<li><a href="http://secondlink.com/">item 1</a></li>
</ul>
EOF
my $filename = "AFA.php";
my $file = HTML::TreeBuilder->new();
$file->store_comments(1);
$file->ignore_ignorable_whitespace(1);
$file->no_space_compacting(1);
my $tree = $file->parse_file($filename);
my $form = $tree->find_by_tag_name('form');
my $fname = $form->attr('name');
if ($fname eq 'mainform') {
$form->delete;
} elsif ($fname eq 'artist_search') {
$form->replace_with($artistSearch)->delete;
} else {
# It's a form we're not changing
}
my $printout = $file->as_HTML("", " ", {});
open (PAGE, "> $filename");
print PAGE $printout;
close (PAGE);
$file->delete;
I am open to any suggestions, examples, etc. I'm not necessarily tied to any particular module, but I'm not exactly an expert programmer.
Thank you!