ansaurus

Question

Need Help with a regular expression not replacing all instances of an expression

Answer 1

+2 A:

Here's one that was actually tested and appears to work.

The issue is that once a match is found, the search continues exactly where the first one left off. As a result, the closing <br /> of #My Novel will not be captured again, and so #Chapter1 is missed.

To capture #Chapter1-like constructs anyway, we can use a lookbehind assertion. Lookbehinds enforce the presence of the prefix, even if it extends before the current position. This also prevents the need to drop it in the replacement string:

Replace (?<preamble> with (?<=
Then in the replacement string, remove the ${preamble} portion.

The overall search expression now looks like:

(?<=             # removed the preamble capture and replaced with a lookbehind
    (                             
        ([<]\/\w+\d*[>])|([<]\w+\d*\s*\/[>])   #</tag> or <tag />
    )
    \s*  #optional whitespace                               
)

(?<hashmarks>
    \#{1,6}      #1-6 hash marks
)    

(?<content>
    .+?          #header content
 )      

(?<closing>
    ([<](br|\/\s*br|br\s*\/)[>])   #<br>,</br>, or <br />
)

And the replacement string looks like:

<h1>${content}</h1>${closing}

Our output is now faithfully:

</div><h1>My Novel</h1><br />
<h1>Chapter1</h1><br />
It was a dark and stormy night<br />
<h1>Chapter 2</h1><br />
The End

Oren Trutner 2009-08-08 00:07:52

You are the man! The lookbehind assertion worked like a charm.

JohnFx 2009-08-08 00:27:22

You also ought to be able to replace `(?<closing>` with a look*ahead* assertion: `(?=`

Ben Blank 2009-08-08 00:38:26

ansaurus

tags:

views:

answers:

Need Help with a regular expression not replacing all instances of an expression

related questions