ansaurus

Question

Stumped on Regex with zero-width positive lookbehind assertion

Answer 1

A:

Get expresso, great tool for working with and writing regexes

To be honest though, I don't know if you can write one to do what you need.
Don't forget, some html tags don't 'need' to be closed to be valid html, and some are self closing in xhtml.

eg. <hr>, <br/>, <p>, <li> <img> or <img /> etc

You might be better off, just keeping a list of valid tags, changing all < and > signs to < and > that aren't part of the valid tags.

Chad 2009-09-02 19:30:03

Answer 2

+1 A:

This is a lot trickier than it seems at first (as you're discovering). It's much easier to come at it from the other direction: use one regex to match an HTML tag OR an angle bracket. If it's a tag you found, you plug it back in; otherwise you convert it. The Replace method with a MatchEvaluator parameter is good for that:

static string ScrubInput(string input)
{
  return Regex.Replace(input, @"</?\w+>|[<>]", GetReplacement);
}

static string GetReplacement(Match m)
{
  switch (m.Value)
  {
    case "<":
      return "&lt;";
    case ">":
      return "&gt;";
    default:
      return m.Value;
  }
}

You'll notice that my tag regex -- </?\w+> -- is more restrictive than yours. I don't know if mine is exactly right for your needs, but I would advise against using <[^<>]+> -- it would find a match in something like "if (x<3||x>9)".

Alan Moore 2009-09-02 21:40:46

ansaurus

tags:

views:

answers:

Stumped on Regex with zero-width positive lookbehind assertion

related questions