ansaurus

Question

Answer 1

+6 A:

What you should actually do is build an Abstract Syntax Tree (AST).

It is a representation of the source code in the form of a tree, that is much easier to work with, especially to make transformations and optimizations.

That code, represented as a tree, would be something like:

(+
    (+
        x
        (U f))
    (+
        (V f)
        y))

You could then try to make some transformations: a sum of sums is a sum of all the terms:

(+
    x
    (U f)
    (V f)
    y)

Then you could scan the tree and you could have the following rules:

(+ (U x) (V x)) = 0, for all x
(+ 0 x1 x2 ...) = x, for all x1, x2, ...

Then you would obtain what you are looking for:

(+ x y)

Any good book on compiler-writing will discuss a lot on ASTs. Functional programming languages are specially suited for this task, since in general it is easy to represent trees and to do pattern matching to parse and transform the tree.

Usually, for this task, you should avoid using regular expressions. Regular expressions define what mathematicians call regular languages. Any regular language can be parsed by a set of regular expressions. However, I think your language is not regular, so it cannot be properly parsed by regexps.

People try, and try, and try to parse languages such as HTML using regular expressions. This has been extensively discussed here in SO, and you cannot parse HTML with regular expressions. There will always be an exceptional case in which your regular expressions would fail, and you would have to adapt it.

It might be the same with your language: if it is not regular, you should avoid lots of headaches and not try to parse it (and especially "transform" it) using regular expressions.

Bruno Reis 2010-03-23 23:17:00

Yes, I vaguely use the concepts of Abstract Syntax Tree (but I don't call it that way, because I do not have any syntax). You say "then you could scan the tree and you could have the following rules": so you think it's good to use regular expressions, as my idea was?

Pindatjuh 2010-03-23 23:20:21

No, regular expressions might not be a good idea. I will comment on that on my answer.

Bruno Reis 2010-03-23 23:21:28

Thank you. All level's languages are regular.

Pindatjuh 2010-03-23 23:26:52

If they are all regular, you *could* try the hard, err, regular-expression way. However, I'm pretty sure that it would be a lot easier to construct a lexer and parser, that would transform the source into a AST, then optimize it. I pretty sure some can suggest you a port of Lex and Yacc for Java, if you have the possibility to go that way! Anyway, good luck with your compiler.

Bruno Reis 2010-03-23 23:34:27

It's not quite Java, but there's a book called "Beginning F#", by Robert Pickering. In his book, chapter 12 (11? don't remember) is dedicated to build a compiler. Only one chapter that teaches you how to do that, exploiting the pattern-matching power of functional languages (in this case, F#).

Bruno Reis 2010-03-23 23:36:15

I've updated the question. I presume that you've misinterpreted my question. Thanks for providing the, very useful in a traditional compiler, information.

Pindatjuh 2010-03-23 23:37:07

Answer 2

+3 A:

I'm having a lot of trouble understanding this question, but I think you will find it useful to learn something about term-rewriting systems, which seems to be what you are proposing. Whether the mechanism is tree rewriting (always works) or regular expressions (will work for some languages some of the time and other languages all of the time) is of secondary importance.

It is definitely possible to optimize object code by term rewriting. You probably also will benefit from learning something about peephole optimization; a good place to start, because it is very strong on the fundamentals, is a paper by Davidson and Fraser on a retargetable peephole optimizer. There's also excellent later work by Benitez and Davidson.

Norman Ramsey 2010-03-24 00:00:22

Thank you! I'll go for peephole optimization, and I guess I can implement it using regular expressions. "It's definately possible to optimize object code by term rewriting"; that was the answer I was looking for.

Pindatjuh 2010-03-24 13:39:39

ansaurus

tags:

views:

answers:

(x86) Assembler Optimization

related questions