views:

214

answers:

5

I read in Sebesta book, that the compiler spends most of its time in lexing source code. So, optimizing the lexer is a necessity, unlike the syntax analyzer.

If this is true, why lexical analysis stage takes so much time compared to syntax analysis in general ?

I mean by syntax analysis the the derivation process.

A: 

Why wouldn't you optimize both?

bbqchickenrobot
This is not an answer.
Eliseo Ocampos
Very perceptive Eliseo. You're right, it's a question! ;)
bbqchickenrobot
If it's not an answer, **delete** it.
Brad Gilbert
it's a rhetorical question/answer. get over it.
bbqchickenrobot
+6  A: 

First, I don't think it actually is true: in many compilers, most time is not spend in lexing source code. For example, in C++ compilers (e.g. g++), most time is spend in semantic analysis, in particular in overload resolution (trying to find out what implicit template instantiations to perform). Also, in C and C++, most time is often spend in optimization (creating graph representations of individual functions or the whole translation unit, and then running long algorithms on these graphs).

When comparing lexical and syntactical analysis, it may indeed be the case that lexical analysis is more expensive. This is because both use state machines, i.e. there is a fixed number of actions per element, but the number of elements is much larger in lexical analysis (characters) than in syntactical analysis (tokens).

Martin v. Löwis
I'm not at all sure, but maybe it's relevent too that there's also a greater variety of character values that there is of lexed token types (e.g. 2^16 possible character values compared with e.g. only 100 token types).
ChrisW
Furthermore, given any particular token, there's quite a small number of token types (perhaps even only one token type) which can legally follow this type of token. I.e., the switch statement for each token in the syntactic analyzer has far fewer cases in it than the switch statements for each character in the lexer.
ChrisW
A: 

Lexical analysis is the process whereby all the characters in the source code are converted to tokens. For instance

foreach (x in o)

is read character by character - "f", "o", etc.

The lexical analyser must determine the keywords being seen ("foreach", not "for" and so on.)

By the time syntactic analysis occurs the program code is "just" a series of tokens. That said, I agree with the answer above that lexical analysis is not necessarily the most time-consuming process, just that it has the biggest stream to work with.

DavidMWilliams
A: 

It depends really where you draw the line between lexing and parsing. I tend to have a very limited view of what a token is, and as a result my parsers spend a lot more time on parsing than on lexing, not because they are faster, but because they simply do less.

Imagist
A: 

It certainly used to be the case that lexing was expensive. Part of that had to do with limited memory and doing multiple file operations to read in bits of program. Now that memory is measured in GB this is no longer an issue and for the same reason a lot more work can be done, so optimization is more important. Of course, whether the optimization helps much is another question.