views:

101

answers:

3
+3  Q: 

ANTLR or Regex?

I'm writing a CMS in ASP.NET/C#, and I need to process things like that, every page request:

<html>
<head>
    <title>[Title]</title>
</head>
<body>
    <form action="[Action]" method="get">
        [TextBox Name="Email", Background=Red]
        [Button Type="Submit"]
    </form>
</body>
</html>

and replace the [...] of course.

My question is how should I implement it, with ANTLR or with Regex? What will be faster? Note, that if I'm implementing it with ANTLR I think that I will need to implement XML, in addon to the [..].

I will need to implement parameters, etc.

EDIT: Please note that my regex can even look like something like that:

public override string ToString()
{
    return Regex.Replace(Input, @"\[
                                    \s*(?<name>\w+)\s*
                                    (?<parameter>
                                        [\s,]*
                                            (?<paramName>\w+)
                                            \s*
                                            =
                                            \s*
                                            (
                                                (?<paramValue>\w+)
                                                |
                                                (""(?<paramValue>[^""]*)"")
                                            )
                                    )*
                               \]", (match) =>
                                  {
                                      ...
                                  }, RegexOptions.IgnorePatternWhitespace);
}        
+1  A: 

About the performance of ANTLR vs. RegEx depends on the implementation of RegEx in C#. I know, from experience, that ANTLR is fast enough.

In ANTLR you can ignore certain content, like the XML. You can also seek for the [ and ] and go further with processing.

Both RegEx and ANTLR are supporting your kind of parameters (the "etc." I'm not sure about).

In terms of development speed: RegEx is slightly faster for such a case like this. You can use an online tool to develop the RegEx and see the capture-groups while you edit the RegEx. (Google @ regex gskinner)

Then ANTLR has perfect support for "error-messages": they show line/column numbers and what was wrong. RegEx doesn't have this support.

A general approach for RegEx would be: create a "global scan" RegEx which will find correct [...] groups in your content. Then let the "..." be captuerd by a group, and then apply another RegEx for this smaller content (which splits content based on the equal-sign and commas). This way you have the best runtime performance and it's easy to develop.

Pindatjuh
please look at my regex in my question - do you think it will be faster than the most well-wrriten ANTLR script for this case?
TTT
Your RegEx in question is slower than an average ANTLR implementation, doing the same. Though, ANTLR is hard to learn and very hard to correctly implement. Thus when you are willing to spend lots of time learning ANTLR's grammar formats, it will be slightly faster. You can also optimize this RegEx: use one RegEx to fetch all content between `[]` and then iterate over these to parse their parameters. This is faster because the RegEx is smaller (which is a rule of thumb when working with performance and RegEx).
Pindatjuh
Okay, so I'll implement this with ANTLR. I don't care about development time + I know a little bit ANTLR. Thank you! Anyway, I'm still looking for more opinions. I will accept this another tommorw, if there are same.
TTT
+4  A: 

Whether the correct tool is RegEx or ANTLR or even something else entirely should be heavily dependent on your requirements. The best answer to a "what tool to use" question shouldn't be primarily based on performance, but on the right tool for the job.

RegEx is a text search tool. If all you need to do is pull strings out of strings then it's often the hammer of choice. You'll likely want a tool to help you build your RegEx. I'd recommend Expresso, but there are lots of options out there.

ANTLR is a compiler generator. If you need error messages and parse actions or any of the complicated things that come with a compiler then it's a good option.

What it looks like you're doing is XML search/replace, have you considered XPath? That would be my suggestion.

Choosing the right tool for the job is definitely important, something that should be researched and thought out before development begins. In all cases, it's important to fully understand the program requirements before making any decisions. Do you have a specification for the project? If not, spending the time to come up with one will save you all the time that a poor tool choice can cost you.

Hope that helps!

Task
A: 

If the language you are parsing is regular then regular expressions are certainly an option. If it is not then ANTLR may be your only choice. If I understand these matters correctly XML is not regular.

High Performance Mark