views:

59

answers:

3

In all of the standard specifications for programming languages, why is it that you cannot directly translate the lexical analysis/layout to a grammar that is ready to be plugged into and working?

I can understand that it would be impossible to adapt it for the likes of Flex/Bison, Lex/Yacc, Antlr and so on, and furthermore to make it readable for humans to understand.

But surely, if it is a standard specification, it should be a simple copy/paste the grammar layout and instead end up with loads of shift/reduce errors as a result which can back fire and hence, produce an inaccurate grammar.

In other words, why did they not make it readable for use by a grammar/parser tool straight-away?

Maybe it is a debatable thing I don't know...

Thanks, Best regards, Tom.

+1  A: 

In other words, why did they not make it readable for use by a grammar/parser tool straight-away?

Standards documents are intended to be readable by humans, not parser generators.

anon
@Neil: Thanks for the prompt answer, but do compiler writers feed parts of it into a grammar/parser generator and hand-fix those that are problematic?
tommieb75
@tommieb75 No. Writing a grammar that is acceptable to a parser generator is a special skill - there are all sorts of tricks and accomodations you have to make. Such a grammar would be very difficult for humans to read, so in practice the grammars used in standards cannot be fed into generators.
anon
@Neil: Thanks for your answer...+1 for your succinct answer ;)
tommieb75
+1  A: 

It is easy for humans to look at a grammar and know what the author intended, however, a computer needs to have a lot more hand holding along the way.

Specifically, these specifications are generally not LL(1) or LR(1). As such, lookaheads are needed, conflicts need to be resolved. True, this could be done in the language specification, but then it is source code for a lexical analyzer, not a language specification.

samoz
@Samoz: it would be nice if there's a such thing wouldn't you think so?
tommieb75
@Samoz: Thanks for your answer! :)
tommieb75
A: 

I agree with your sentiment, but the guys writing standards can't win on this.

To make the lexer/grammar work for a parser generator directly-out-of-standard, the standard writers would have to choose a specific one. (What choice would the COBOL standard folks have made in 1958?)

The popular ones (LEX, YACC, etc.) are often not capable of handling reference grammars, written for succinctness and clarity, and so would be a poor (e.g. non-)choice.

More exotic ones (Earley, GLR) might be more effective because they allow infinite lookahead and ambiguity, but are harder to find. So if a specific tool like this was chosen you would not get what you wanted, which is a grammar that works with the parser generator you have.

Having said that, the DMS Software Reengineering Toolkit uses a GLR parser generator. We don't have to massage reference grammars a lot to get them to work, and DMS now handles a lot of languages, including ones that are famously hard such as C++. IMHO, this is as close to your ideal as you are likely to get.

Ira Baxter
Also, in theory at least, the GLR and Earley algorithms are exponential algorithms, while the algorithms for LL(k) and LALR are both linear.
Joel
Actually, GLR and Early are N^3, not exponential, and have near linear average behavior. LL(k)/LALR are linear, for those grammars which fit in their category, sure, but if they can't process your grammar then they aren't a reasonable proposal. I have met a *lot* of programming languages in 40 years, and the only ones I recall being LALR out of the box was Wirth's original PASCAL (not the one that made into practice) and LISP. So, LL(k)/LALR aren't the answers for reference grammar, unless you insist that all reference grammars are in that category. So much for succinctness.
Ira Baxter