views:

58

answers:

3

Hi all,

I am in the phase of scanning to build a compiler. I wonder if I should read entire file content before processing? I think it should be better since my compiler may need to do some optimization later (so I dont need to reread the file). But what if the input program is kinda big, it could take lots of memory to hold the file content.

Need some more ideas and discussion.

Thanks.

+1  A: 

Optimization should not normally require a second pass over the actual source code - the first thing you should do is to tokenise it and then work on the tokenised version. The on;y reason for hanging on to the source is if you need to exactly reproduce it in your error messages, which you probably don't.

anon
+1  A: 

Usually, the first think you do is a lexical analysis in which you split the input file into tokens. Then you build a symbol table and an abstract syntax tree. Any optimization or code generation then works on these intermediate data structures rather than on the original input file. Hence, I see no point in completely reading and buffering the input file.

Niels Lohmann
+1  A: 

Optimizations would happen on the Abstract Syntax Tree or some later intermediate representation, not on the source code. And the AST will definitely need to fit entirely in memory. The source code doesn't, because it can be transformed into the AST on-the-fly.

Pascal Cuoq