tags:

views:

91

answers:

2

I need to replace all instances of a match but only within certain tags.

For example, consider an HTML page that has a <body>...</body>

within these tags I need to replace all occurance of say:

{embed=xxx}

to

<a href="xxx">xxx</a>

I can do this for the whole page using something like (attempt #1):

match={embed=(.*?)}
replace=<a href="$1">$1</a>

but this replaces all the parts of the page even the sections where I do not want it to be replaced notably the head section.

When I try to add conditions around the match defined above like this (attempt #2):

match=(<body.*?)(?:({embed=(.*?)})+)(.*?)(</body)
replace=$1<a href="$3">$3</a>$4$5

if only replaces the first item.

So if I was using this sample text data to search:

<head>
{embed=zzz}
</head>
<body>
{embed=aaa}<br />
{embed=bbb}<br />
{embed=ccc}<br />
</body>

I get:

<head>
{embed=zzz}
</head>
<body>
<a href="aaa">aaa</a>aaa<br />
{embed=bbb}<br />
{embed=ccc}<br />
</body>

Ideally the output I want is:

<head>
{embed=zzz}
</head>
<body>
<a href="aaa">aaa</a><br />
<a href="bbb">bbb</a><br />
<a href="ccc">ccc</a><br />
</body>

I know I'm probably over complicating things but regex is like oil to my brain's water - they just don't mix.

A: 

The .NET Method you are looking for is System.Text.Regular Expressions.Regex.Replace(InputString, ReplacementString)

This will replace all matches of the pattern in the input string with the Regex Replacement String.

Example Usage:

Dim regex As New System.Text.Regular Expressions.Regex("(<body.*?)(?:({embed=(.*?)})+)(.*?)(</body)")
Dim newString = regex.Replace(inputString, "$1<a href=""$3"">$3</a>$4$5")

The documentation is here.

(Sorry about the Visual Basic example. It is just what comes to mind when .NET is mentioned.)

huntaub
Thanks for the info. While I am writing my app in .NET, my regex tester app must not be written in .NET because I was not seeing that behavior.
NFX
A: 

This calls for lookbehind and lookahead. The infinite repetition only works for .net BTW. Try using this:

match=(?<=<body[^>]*>.*){embed=(.*?)}(?=.*</body>)

The first capture will match any thing that stats with a <body> tag and everything up to the embed tag, but the capture has zero width. The third does the same for the end tag, but will match thing ending in it.

J Hall
This works great.
NFX