tags:

views:

143

answers:

2

I'm upgrading a set of web pages to a new system, and I want to strip out and replace the boilerplate at the top of each page, and replace it with new boilerplate. Fortunately, each page has a content table, and no tables before it. I want to do something like:

$contents =~ s/^.*<table/$newHeader/

This only works for the first line of $contents. Is there a way to replace everything before (and including) the first <table in the file with my new boilerplate?

+7  A: 

You could use Perl's "/s" option which tells it that "." matches all characters including newlines (deal with the string as a single giant line instead of per-line). You limit the match to the first table by using the ? quantifier to make the * non-greedy:

$contents =~ s/^.*?<table/$newHeader/s

Also, just remember that the replacement will also strip out the text "<table" so you will need to make sure that it gets inserted back in somehow, possibly with:

$contents =~ s/^.*?<table/<table$newHeader/s

Or you can use a zero-width positive look-ahead assertion, which says "following the match, this expression must also match" but the text in the lookahead assertion is not considered part of the match (and therefore won't be replaced):

$contents =~ s/^.*?(?=<table)/$newHeader/s

And that will leave the "<table" intact.

Adam Batkin
Perfect. Thanks!
Dan D.
That's a nice answer, but it's not *perfect*. It's simply not correct that the /s option will make a dot match whitespace. A dot matches whitespace just fine, even without any options. Replace "whitespace" with "newline" and the answer is indeed perfect.
innaM
@Manni: You are right, fixed.
Adam Batkin
Thanks: +1.
innaM
+3  A: 

The "." normally matches any character except a newline. Append "s" onto your regexp to make it match over multiple lines:

 $contents =~ s/^.*?<table/$newHeader/s;
clintp