tags:

views:

993

answers:

12

This is probably a silly question to people knowledgeable about compilers, but I'm honestly ignorant about the answer.

Is C++ syntax context-sensitive? In other words, is C++ syntax not expressible with context-free grammars?

Obviously C++ is parsed with regular parsing tools and that parsing is not particularly ineffective (!), but I do recall reading something about this (I don't remember where) and I thought I'd get my facts straight with the community before spreading lies... Maybe a subset of C++ is CS? Maybe the new spec? Any help is appreciated.

EDIT: Thanks for all the responses. However, for completeness' sake, what part of compilation deals with the ambiguity? A CFG is a grammar that has only one non-terminal symbol on the left-hand-side of the rule (eg. A->zC), a CSG is one that has a terminal (plus a non-terminal) on the left-hand-side (aAv->QT), where uppercase letters are nonterminals and lowercase are terminals. Is any representation like the latter in the grammar parsing C++ source code?

Thanks.

+3  A: 

True :)

J. Stanley Warford. Computer systems. Pages 341-346.

anno
+3  A: 

C++ is parsed with GLR parser. That means during parsing the source code, the parser may encounter ambiguity but it should continue and decide which grammar rule to use later.

look also,

Why C++ cannot be parsed with a LR(1) parser?


Remember that context-free grammar can not describe ALL the rules of a programming language syntax. For example, Attribute grammar is used to check the validity of an expression type.

int x;
x = 9 + 1.0;

You can not describe the following rule with context-free grammar : The Right Side of the assignment should be of the same type of the Left Hand side.

AraK
Most C++ parsers do not use GLR parsing technology. GCC doesn't.Some do. See http://www.semanticdesigns.com/Products/FrontEnds/CppFrontEnd.html for one that does.
Ira Baxter
A: 

The grammar of a compiler is build on tokens which are scanned by the lexical analysis. Therefore, it is irrelevant for the grammar if the keywords are context-sensitive or not, since this would have been already handled (i.e. both class and CLASS being the same token coming from the lexical analysis).

I know this is somewhat bailing out of your question of context-free grammars, however, this is how compilers are build which allows several exceptions to strict context free principles (i.e. forward declarations are not context-free, but possible through the use of symbol tables).

Hope this answers the question.

txwikinger
+8  A: 

In

B = A();

is A() a function call or an object construction?

That depends.

Not to mention the effect of typedefs.

dmckee
shouldn't there be a new before the class if it would be an object construction?
txwikinger
@txwikinger: You can instantiate variables on the stack in this fashion without using new.
Eric
A: 

C++ is not context free. I learned it some time ago in compilers lecture. A quick search gave this link, where the "Syntax or semantics" section explains why C and C++ are not context free:

Wikipedia Talk: Context-Free grammar

Regards,
Ovanes

ovanes
+10  A: 

Yes. The following expression has a different order of operations depending on type resolved context:

Edit: When the actual order of operation varies, it makes it incredibly difficult to use a "regular" compiler that parses to an undecorated AST before decorating it (propagating type information). Other context sensitive things mentioned are "rather easy" compared to this (not that template evaluation is at all easy).

#if FIRST_MEANING
   template<bool B>
   class foo
   { };
#else
   static const int foo = 0;
   static const int bar = 15;
#endif

Followed by:

static int foobar( foo < 2 ? 1 < 1 : 0 > & bar );
280Z28
Pure evil - ouch!
Jonathan Leffler
+4  A: 

You might want to take a look at The Design & Evolution of C++, by Bjarne Stroustrup. In it he describes his problems trying to use yacc (or similar) to parse an early version of C++, and wishing he had used recursive descent instead.

anon
Wow... Thanks. I wonder if it really makes sense to **think** about using anything more powerful than a CFG to parse any artificial language.
Dervin Thunk
Great book for understanding the whys of C++. I recommend that and Lippman's Inside the C++ Object Model for understanding how C++ works. Though both are a bit dated they are still a good read.
Matt Price
"Meta-S" is a context-sensitive parsing engine by Quinn Tyler Jackson. I've not used it, but he tells an impressive story.Check out his comments in comp.compilers, and see http://www.rnaparse.com/MetaS%20defined.htm
Ira Baxter
@IraBaxter: your x-ref is MIA today - and solid references to the software seem to be elusive (Google search doesn't provide any good leads, either with 'site:rnaparse.com meta-s' or 'quinn jackson meta-s'; there are bits and pieces, but http://meta-s.com leads to a non-informative web-site, for example).
Jonathan Leffler
A: 

Obviously, if you take the question verbatim, nearly all languages with identifiers are context sensitive.

One need to know if an identifier is a type name (a class name, a name introduced by typedef, a typename template parameter), a template name or some other name to be able to correctly some of the use of identifier. For instance:

x = (name)(expression);

is a cast if name is a type name and a function call if name is a function name. Another case is the so called "most vexing parse" where it isn't possible to differentiate variable definition and function declaration (there is a rule saying it is a function declaration).

That difficulty has introduced the need of typename and template with dependent names. The rest of C++ isn't context sensitive as far as I know (i.e. it is possible to write a context free grammar for it).

AProgrammer
A: 

Yes, it is context-sensitive. For instance, take operator <<

It can be either a bit shift operator or a stream operator, depending on the context.

Nemanja Trifunovic
That is not what is meant by context sensitivity when it comes to grammars.
anon
+1  A: 

Meta-S" is a context-sensitive parsing engine by Quinn Tyler Jackson. I've not used it, but he tells an impressive story. Check out his comments in comp.compilers, and see rnaparse.com/MetaS%20defined.htm – Ira Baxter Jul 25 at 10:42

The correct link is parsing enigines

Meta-S was the property of a defunct company called Thothic. I can send a free copy of the Meta-S to anyone interested and I've used it in rna parsing research. Please note the "pseudoknot grammar" included in the examples folders was written by an non-bioinformatics, amature programmer and basically doesn't work. My grammars take a different approach and work quite well.

james Lynn
This is actually an interesting find.
Dervin Thunk
A: 

C++ templates have been shown to be Turing Powerful. Although not a formal reference, here's a place to look in that regard:

http://cpptruths.blogspot.com/2005/11/c-templates-are-turing-complete.html

I will venture a guess (as old as a folkoric and concise CACM proof showing that ALGOL in the 60's could not be reprsented by a CFG) and say that C++ cannot therefore be correctly parsed only by a CFG. CFGs, in conjunction with various TP mechanisms in either a tree pass or during reduction events -- this is another story. In a general sense, due to the Halting Problem, there exists some C++ program that cannot be shown to be correct/incorrect but is nonetheless correct/incorrect.

{PS- As the author of Meta-S (mentioned by several people above) -- I can most assuredly say that Thothic is neither defunct, nor is the software available for free. Perhaps I have worded this version of my response such that I do not get deleted or voted down to -3.}

Quinn Tyler Jackson