views:

77

answers:

2

I did some searching and didn't find a question that "directly" answered this question.

Anyway the basic gist of this question is I am wondering what "language feature" or "syntax" that makes a language be a major pain to build a parser, syntax highlighting, etc?

This might be subjective but I was thinking of like for example the difference in parsing a language like say Lisp for example with its (func parms etc..) structure, as versus to something like C++ with all of the templates, brackets and so forth.

+4  A: 

Languages that support syntax extension through macros or other means cannot be fully parsed unless you can properly expand the macros. For languages with full procedural macros such as Lisp or Curl, you can't fully parse without implementing the language itself!

Typically for the purposes of syntax highlighting for such languages you don't try to expand macros and assume that macros follow conventional language idioms.

Christopher Barber
Ah that does make sense! I never thought about the problems of supporting/parsing languages that can self-extend their own syntax, that would indeed be a thorny problem. However I'm also wondering about some of the "more normal" syntax stuff such as Python's white space versus C's bracket styles? How would that affect the parsing?
Pharaun
+1  A: 

From the point of view of formal langugaes and grammars there are two main aspects IMHO. First of all grammar for your language should belong to some easy processable category. For example language with context-free grammar, which means that e.g. your language has too elements, whose count depend on each other, like open and close brackets for example, might need potentially infinite amount of memory to parse. C++ has context sensitive grammar which is even worse, example could be grammar having three elements with interdependent ammounts. Another aspect is about ambiguity while parsing. In ambiguous grammar you can parse same text in different ways, which means you have to find the right way for your parsing algorithm - most of them do not allow ambiguity at all.

I am not entirely sure, but I would say, that parsing brackets and whitespaces (when reasonably defined) is equally complex. For both cases you would need a counter to check the level of block nesting, however using whitespaces you can identify the level locally (by counting whitespaces) and you can be sure, that your counter will not go under zero, which might happen when you have more closing brackets than opening.

Gabriel Ščerbák