I have had problems in Regexes to divide a code up into functional components. They can break or it can take a long time for them to finish. The experience raises a question:
"When should I use a parser?"
I have had problems in Regexes to divide a code up into functional components. They can break or it can take a long time for them to finish. The experience raises a question:
"When should I use a parser?"
You should use a parser when you are interested in the lexical or semantic meaning of text, when patterns can vary. Parsers are generally overkill when you are simply looking to match or replace patterns of characters, regardless of their functional meaning.
In your case, you seem to be interested in the meaning behind the text ("functional components" of code), so a parser would be the better choice. Parsers can, however, internally make use of regex, so they should not be regarded as mutually exclusive.
A "parser" does not automatically mean it has to be complicated, however. For example, if you are interested in C code blocks, you could simply parse nested groups of { and }. This parser would only be interested in two tokens ('{' and '}') and the blocks of text between them.
However, a simple regex comparison is not sufficient here because of the nested semantics. Take the following code:
void Foo(bool Bar)
{
if(Bar)
{
f();
}
else
{
g();
}
}
A parser will understand the overall scope of Foo, as well as each inner scope contained within Foo (the if and else blocks). As it encounters each '{' token, it "understands" their meaning. A simple search, however does not understand the meaning behind the text and may interpret the following to be a block, which we of course know is not correct:
{
if(Bar)
{
f();
}
Not exactly sure if it's a duplicate -- but check the following posts out:
Your question is a bit vague, but I guess my opinion is that when your regex becomes complicated or takes too long, and you have a reasonably defined "language" to deal with, a parser will be easier.
I don't think you can set a line in the sand and say that anything on one side can be done by regex, and on the other side you need a parser. It depends on the situation.
You need to use a parser as soon as you have a problem regular expressions is not meant to, (or simply can't) solve. Matching (un)balanced parenthesis (recursively) for instance is one of those problems. Eventhough some flavours, like PCRE, get you very far they don't win over a hand written parser.
There are a few compelling use cases for parsers over regular expressions. You should use a parser instead of a regular expression:
Here are some use cases, courtesy of Steve Yegge: Rich Programmer Food.
The Dragon Book has a small section about what you can't use Regular Expressions for:
For 1 and 2, there's a simple explanation, you can't capture a substring so you can match it later. If you would, than you would be using a parser. Just think of how you would be using regular expressions for those cases, and you will intuitively come to the conclusion you can't. :)
For 3, it's the same as the problem in K&R for parsing string literals. You can't just say a string literal is between the first ' " ' and the second ' " ', but what happens when there's an escaped quote(\")?
As for the relation to Russel's paradox, I think you're hunch is right, because the problem is regex's limited introspection capabilities. The book has references to the proofs. If you want to, I can look them up for you.
There are things that regex cannot do while parser can do.
For example:
Start ::= (Inner);
Inner ::= Start | x;
Regular expression wouldn't be able to do that because regex can't track if there are same number of open and close parenthesis. That is why when you are trying to tokenize and parse a large file, parser is expected to be used, while regex can simply find special pattern(s) inside the file.