I'm building a compiler/assembler/linker in Java for the x86-32 (IA32) processor targeting Windows.
High-level concepts (I do not have any "source code": there is no syntax nor lexical translation, and all languages are regular) are translated into opcodes, which then are wrapped and outputted to a file. The translation process has several phases, one is the translation between regular languages: the highest-level code is translated into the medium-level code which is then translated into the lowest-level code (probably more than 3 levels).
My problem is the following; if I have higher-level code (X
and Y
) translated to lower-level code (x
, y
, U
and V
), then an example of such a translation is, in pseudo-code:
x + U(f) // generated by X
+
V(f) + y // generated by Y
(An easy example) where V
is the opposite of U
(compare with a stack push as U
and a pop as V
). This needs to be 'optimized' into:
x + y
(essentially removing the "useless" code)
My idea was to use regular expressions. For the above case, it'll be a regular expression looking like this: x:(U(x)+V(x)):null
, meaning for all x
find U(x)
followed by V(x)
and replace by null
. Imagine more complex regular expressions, for more complex optimizations. This should work on all levels.
What do you suggest? What would be a good approach to optimize and produce fast x86 assembly?