views:

344

answers:

3

Can somebody put a regex expression that will:

  1. find a chunk that starts with [% and ends with %]
  2. within that chunk replace all xml special characters with:
    & quot; & apos; & lt; & gt; & amp;
  3. leave everything between <%= %> or <%# %> as is except make sure that there is space after <%# or <%= and before %> for example <%=Integer.MaxValue%> should become <%= Integer.MaxValue %>

source:

[% 'test' <mtd:ddl id="asdf" runat="server"/> & <%= Integer.MaxValue% > %]

result:

&apos;test&apos; &lt;mtd:ddl id=&quot;asdf&quot; runat=&quot;server&quot;/&gt; &amp; <%= Integer.MaxValue %>
+1  A: 
private void button1_Click(object sender, EventArgs e)
        {
            Regex reg = new Regex(@"\[%(?<b1>.*)%\]");
            richTextBox1.Text= reg.Replace(textBox1.Text, new MatchEvaluator(f1));
        }

        static string f1(Match m)
        {
            StringBuilder sb = new StringBuilder();
            string[] a = Regex.Split(m.Groups["b1"].Value, "<%[^%>]*%>");
            MatchCollection col = Regex.Matches(m.Groups["b1"].Value, "<%[^%>]*%>");
            for (int i = 0; i < a.Length; i++)
            {
                sb.Append(a[i].Replace("&", "&amp;").Replace("'", "&apos;").Replace("\"", "&quot;").Replace("<", "&lt;").Replace(">", "&gt;"));
                if (i < col.Count)
                    sb.Append(col[i].Value);
            }
            return sb.ToString();
        }

Test1:

[% 'test' <mtd:ddl id="asdf" runat="server"/> & <%= Integer.MaxValue%> fdas<% hi%> 321%]

result:

 &apos;test&apos; &lt;mtd:ddl id=&quot;asdf&quot; runat=&quot;server&quot;/&gt; &amp; <%= Integer.MaxValue%> fdas<% hi%> 321
ebattulga
+2  A: 

Used 2 regular expressions. 1st to match the general form, 2nd to deal with the inner plumbing.

For the XML encoding I used an obscure little method found in System.Security: SecurityElement.Escape Method. I fully qualified it in the code below for emphasis. Another option would be using the HttpUtility.HtmlEncode method but that may involve a reference to System.Web depending on where you're using this.

string[] inputs = { @"[% 'test' <mtd:ddl id=""asdf"" runat=""server""/> & <%= Integer.MaxValue %> %]",
    @"[% 'test' <mtd:ddl id=""asdf"" runat=""server""/> & <%=Integer.MaxValue %> %]",
    @"[% 'test' <mtd:ddl id=""asdf"" runat=""server""/> & <%# Integer.MaxValue%> %]",
    @"[% 'test' <mtd:ddl id=""asdf"" runat=""server""/> & <%#Integer.MaxValue%> %]",
};
string pattern = @"(?<open>\[%)(?<content>.*?)(?<close>%])";
string expressionPattern = @"(?<content>.*?)(?<tag><%(?:[=#]))\s*(?<expression>.*?)\s*%>";

foreach (string input in inputs)
{
    string result = Regex.Replace(input, pattern, m =>
     m.Groups["open"].Value +
     Regex.Replace(m.Groups["content"].Value, expressionPattern,
   expressionMatch =>
         System.Security.SecurityElement.Escape(expressionMatch.Groups["content"].Value) +
      expressionMatch.Groups["tag"].Value + " " +
      expressionMatch.Groups["expression"].Value +
      " %>"
     ) +
     m.Groups["close"].Value
    );

    Console.WriteLine("Before: {0}", input);
    Console.WriteLine("After: {0}", result);
}


Results:

Before: [% 'test' <mtd:ddl id="asdf" runat="server"/> & <%= Integer.MaxValue %> %]
After: [% &apos;test&apos; &lt;mtd:ddl id=&quot;asdf&quot; runat=&quot;server&quot;/&gt; &amp; <%= Integer.MaxValue %> %]
Before: [% 'test' <mtd:ddl id="asdf" runat="server"/> & <%=Integer.MaxValue %> %]
After: [% &apos;test&apos; &lt;mtd:ddl id=&quot;asdf&quot; runat=&quot;server&quot;/&gt; &amp; <%= Integer.MaxValue %> %]
Before: [% 'test' <mtd:ddl id="asdf" runat="server"/> & <%# Integer.MaxValue%> %]
After: [% &apos;test&apos; &lt;mtd:ddl id=&quot;asdf&quot; runat=&quot;server&quot;/&gt; &amp; <%# Integer.MaxValue %> %]
Before: [% 'test' <mtd:ddl id="asdf" runat="server"/> & <%#Integer.MaxValue%> %]
After: [% &apos;test&apos; &lt;mtd:ddl id=&quot;asdf&quot; runat=&quot;server&quot;/&gt; &amp; <%# Integer.MaxValue %> %]

EDIT: if you don't care to preserve the opening/closing [%%] in the final result then change the pattern to:

string pattern = @"\[%(?<content>.*?)%]";

Then be sure to remove references to m.Groups["open"].Value and m.Groups["close"].Value.

Ahmad Mageed
(THANK YOU)^4.
epitka
Interesting, if I read in a string from a file, this does no work? String is exactly the same as the one you used (first one) except not double double ("") quotes. Any idea why?
epitka
@epitka: is the data all on 1 line or split up? In that case it won't match (ie. doesn't span across lines). The 2nd set of double quotes is only there to escape it in code as a verbatim string using the @ symbol, so your actual input would naturally feature only one double quote. That said, it shouldn't matter. I placed the sample data in a text file and changed the 1st line to: string[] inputs = File.ReadAllLines(@"c:\temp.txt") and it worked fine. Also tried with one string using File.ReadAllText(...). Can you show us how you're reading it from the file and how the data looks like?
Ahmad Mageed
You could add the RegexOptions.Singleline option at the end of both Regex.Replace statements (add it as the final parameter, there's an overloaded Replace method for it) and this would make it work if the data spans across multiple lines.
Ahmad Mageed
I used also the File.ReadAllText(path) but this was happening in assembly written in VB.NET, and read text was being passed to C# where this parse method was. Don't know if that is the reason, but if I move read to C# assembly and pass the input string it works fine. What is more interesting that for this second solution (ebattulga's) it did not matter, it worked in both cases.
epitka
A: 

I think the code will be clear without the use of RegEx. I would tend to write a separate method (and unit test) for each line of your spec then chain them together.

See also "When not to use Regex in C# (or Java, C++ etc)"

Ian Ringrose