tags:

views:

74

answers:

2

Alright, an easy one for you guys. We are using ActiveReport's RichTextBox to display some random bits of HTML code.

The HTML tags supported by ActiveReport can be found here : http://www.datadynamics.com/Help/ARNET3/ar3conSupportedHtmlTagsInRichText.html

An example of what I want to do is replace any match of <div style="text-align:*</div> by <p style=\"text-align:*</p> in order to use a supported tag for text-alignment.

I have found the following regex expression to find the correct match in my html input:

<div style=\"text-align:(.*?)</div>

However, I can't find a way to keep the previous text contained in the tags after my replacement. Any clue? Is it me or Regex are generally a PITA? :)

    private static readonly IDictionary<string, string> _replaceMap =
        new Dictionary<string, string>
            {
                {"<div style=\"text-align:(.*?)</div>", "<p style=\"text-align:(.*?)</p>"}
            };

    public static string FormatHtml(string html)
    {
        foreach(var pair in _replaceMap)
        {
            html = Regex.Replace(html, pair.Key, pair.Value);
        }

        return html;
    }

Thanks!

+3  A: 

Use $1:

{"<div style=\"text-align:(.*?)</div>", "<p style=\"text-align:$1</p>"}

Note that you could simplify this to:

{"<div (style=\"text-align:(?:.*?))</div>", "<p $1</p>"}

Also it is generally a better idea to use an HTML parser like HtmlAgilityPack than trying to parse HTML using regular expressions. Here's how you could do it:

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
foreach (var e in doc.DocumentNode.Descendants("div"))
    e.Name = "p";
doc.Save(Console.Out);

Result:

<p style="text-align:center">foo</p><p style="text-align:center">bar</p>
Mark Byers
+3  A: 

Instead of using regex'es you should use a tool that is more suited to parse and modify html. I would recommend the Html Agility Pack for this - it was written to do just what you need.

Rune Grimstad
Thanks for the suggestion, but I'm only looking for a quick easy way to solve this without any external libraries. I'll make sure to have a look at the Html Agility pack though, could be useful on some other projects!
matthew.perron
matthewpw: I think you're missing his point. HtmlAgilityPack *is* a quick and easy way to solve your task - regex is not designed for parsing HTML and that's why you're finding it difficult.
Mark Byers
Check out this answer to a similar question. It's a StackOverflow classic: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454
Rune Grimstad
I've taken a look at that SO classic, good read! Also, I'll check out Agility Pack if I ever need to mess with HTML again. But really, the simple regex replacement is what I was looking for: ActiveReport implements a very lame HTML renderer; the subset of supported tags is minimalistic and the HTML I want to 'sanitize' is really nothing complex.But I got your point though, and Rune's awnser is definatly +1 material!
matthew.perron
Haha! Ok. Good luck on that! :-)
Rune Grimstad