tags:

views:

135

answers:

3

I'm trying to remove the object tag from a text file:

    <object classid=""clsid:F08DF954-8592-11D1-B16A-00C0F0283628"" id=""Slider1"" width=""100"" height=""50"">
  <param name=""BorderStyle"" value=""1"" />
  <param name=""MousePointer"" value=""0"" />
  <param name=""Enabled"" value=""1"" />
  <param name=""Min"" value=""0"" />
  <param name=""Max"" value=""10"" />
</object>

My regex so far is:

hmtl = Regex.Replace(html, @"]>(?:.?)?", "", RegexOptions.IgnoreCase);

The inner param tags are not removed.

A: 

If I understand what you're asking, this will do it:

$line =~ s/<object.*?>.*?<\/object>//is;

That's Perl, so the potential quirks:

  • ? indicates a non-greedy match, i.e. that it should match the first possible termination of the pattern rather than the last
  • /i is case insensitive
  • /s says to treat the whole text as a single line (to be able to match across line breaks)
scotchi
+1  A: 

You should be able to specify the <object> tag as a part of your expression, and match everything to until the </object> tag.

Regex.Replace(html, @"<object.*?</object>", "", RegexOptions.Singleline);
jheddings
A: 

This RegEx might work for you (it is very hungry-greedy):

<object.+</object>

But I would advise to use HtmlAgilityPack instead.
It provides the ability to use HTML's DOM.
So you would work with it just like with XmlDocument:

HtmlDocument doc = new HtmlDocument();
doc.Load("file.htm");
foreach(HtmlNode obj in doc.DocumentElement.SelectNodes("object") {
 obj.Parent.RemoveChild(obj);
}
doc.Save("file.htm");
Dmytrii Nagirniak