views:

202

answers:

5

I'm reading Flex & Bison from O'Reilly, and would like to know if learning Regular Expressions beforehand will help in developing a programming language?

+11  A: 

Regular expressions can be defined using formal language theory, so they're complementary concepts.

It would be a good idea to have a good understanding of both regular expressions and formal language theory before starting to build a language.

So to answer your Boolean question: Yes.

Ben S
I cannot +1 this answer hard enough. Without an understanding of formal language theory fundamentals you can build a language, but it will be terrible.
Welbog
It will also likely be ambiguous. Which can result in unpredictable behavior in your compiler.
Ben S
+1 for the "So to answer your Boolean question, yes" part :P
Seth Illgard
I'd like to stress that you should know formal language theory in order to develop a programming language. Know at least what a regular language is (general programming languages generally are not), what a context-free language is.
Svante
REs are theoretically important in the hierarchy of languages, and they are useful for lexical analysis (viz. flex); but in the big picture they are of extremely minor significance for almost any language that anyone would want to make -- they correspond to a tiny subset of simple languages. Typical formal languages that we like to work with are usually nearer to CFGs, and these are what you should be learning about for language (at least, grammar) design. Otherwise REs are relevant to flex, if you happen to want to use that for lexical analysis (not everyone does).
Edmund
+3  A: 

Regular expressions for conventional programming languages' syntax are quite simple, so, strictly speaking, you don't need to be a regexp expert to write a compiler. On the other side, regexp belongs to basic programming skills, so I'd say you need to know them... for pretty much everything.

stereofrog
Gonna have to -1 here. Regular expressions are a subject in computability, not just a perl library.
Peter Turner
Automata theory is "a subject in computability". Regexps is just a perl library.
stereofrog
A: 

If you were creating an interpreted language, you can use regex to identify the various atoms in a line of code.

RobKohr
I cut out the paragraph about the need to learn regex for general programming. The slimmed down answer is more on topic.
RobKohr
A: 

Maybe I'm off track because the other answerers think you're asking about PCRE or something. But if you're talking about inventing a language, then regular expressions are about as important as the syntax and anything else.

Regular Expressions are a step on the Chomsky Hierarchy between Push Down Automata and Deterministic Finite Automata, very important stuff to know about and exceptionally necessary when parsing anything, especially code.

Peter Turner
PDAs are higher than DFAs, the state machines for deciding regular languages. You have your hierarchy in the wrong order.
Welbog
I thought Regexes were higher than DFA's. I know pushdown automatons are.
Peter Turner
Regexes have the same expressibility as DFAs. They decide the same domain of languages. DFAs are the *implementation of* regular expressions. Unless you're thinking of Perl "regular" expressions which aren't actually regular expressions at all. Those are, in fact, a step higher than actual regular expressions. The hierarchy is visible here: http://en.wikipedia.org/wiki/Chomsky_hierarchy
Welbog
+1  A: 

I'd say so. Sounds like you've run across the Flex scanner in Example 1.3 of Flex & Bison (p. 5):

/* recognize tokens for the calculator and print them out */
%%
"+"      { printf("PLUS\n"); }
"-"      { printf("MINUS\n"); }
"*"      { printf("TIMES\n"); }
"/"      { printf("DIVIDE\n"); }
"|"      { printf("ABS\n"); }
[0-9]+   { printf("NUMBER %s\n", yytext); }
\n       { printf("NEWLINE\n"); }
[ \t]    { }
.        { printf("Mystery character %s\n", yytext); }
%%

As you've seen, NUMBER, whitespace, and mystery character are defined using simple regular expressions (well, the others are too, but they're not very interesting). Your programming language will doubtless use other regexes (eg, think about tokens for hex literals, octal literals, float/doubles and comments in C/C++/Java). They're also a useful technique for programming in general, so I'd go ahead and learn something about them now.

Jim Ferrans