views:

235

answers:

3

I have not gotten into the field of formal languages in computer science yet, so maybe my question is silly. I am writing a simple NMEA parser in C++, and I have to choose:

My first idea was to build a simple finite state machine manually, but then I thought that maybe I could do it with less work, even more efficiently. I used regular expressions before, but I think the NMEA regular expression is very long and should take "long time" to match it.

Then I thought about using a parser generator. I think all use the same method: they generate a FSA. But I don't know which is more efficient. When do you normally use parser generators instead of regexes (I think you could write regex in parser generator)?

Please explain the differences, I'm interested in both theory and experience.

+4  A: 

Regex is a parser-generator.

From wikipedia:

Regular expressions (abbreviated as regex or regexp, with plural forms regexes, regexps, or regexen) are written in a formal language that can be interpreted by a regular expression processor, a program that either serves as a parser generator or examines text and identifies parts that match the provided specification.

If you're going over a list that only needs to be gone over once, then save the list to a file and read it from there. If you're checking things that are different every time, use regex and store the results in an array or something.

It's much faster than you would assume it to be. I've seen expressions bigger than this post.

Adding that you can nest as much as you'd like, in whatever language you decide to code it in. You could even do it in sections, for maximum re-usability.

Sneakyness
+7  A: 

Well, a simple rule of thumb is: If the grammar of the data you are trying to parse is regular, use regular expressions. If it is not, regular expressions may still work (as most regex engines also support non-regular grammars), but it might well be painful (complicated / bad performance).

Another aspect is what you are trying to do with the parsed data. If you are only interested in one field, a regex is probably easier to read. If you need to read deeply nested structures, a parser is likely to be more maintainable.

sleske
+2  A: 

As Sneakyness points out, you can have a large and complicated regular expression that is surprisingly powerful. I've seen some examples of this, but none were maintainable by mere mortals. Even using Expresso only helped so much; it was still difficult to understand and risky to modify. So unless you're a savant with a fixation on Grep, I would not recommend this direction.

Instead, consider focusing on the grammar and letting a compiler compiler do the heavy lifting for you.

Steven Sudit