views:

78

answers:

1

Hi, I am working on a very simple decompiler for MIPS architecture and as I progress I have to define lots of rules for code analysis, for example "if this opcode is lui and next opcode is addiu then return var = value" or "if this opcode is bne and it's referring to address before current - create loop definition in parsing tree". The problem - there are tons of such rules and I can't find a good way to define them. I've tried writing separated functions for every rule, defining nice OOP base logic classes and extending them to create rules, even tried regular expressions on disasmed code(to my surprise this works better than expected) but no matter what I've tried, my code soon became to big and to hard to read no matter how well I am trying to document and structure it.

This brings me to conclusion, that I am trying to solve this task by using wrong tools(not to mention being too stupid for such complex task :) ), but I have no real idea what should I try. Currently I have two untested ideas, one is using some kind of DSL(I have absolutely no experience in this, so I can be totally wrong), and another is writing some kind of binary regexp-like tools for opcode matching.

I hope someone can point me in correct direction, thx.

+2  A: 

I would guess that some of your rules are too low-level, and that's why they're becoming unmanageable.

Recognising lui followed by addiu as a 32-bit constant load certainly seems very reasonable; but trying to derive control flow from branch instructions at the individual opcode level seems rather more suspect - I think you want to be working with basic blocks there.

Cifuentes' Reverse Compilation Techniques is a reference which keeps cropping up in discussions of decompilation that I've seen; from a fairly brief skim, it seems like it would be well worth spending some time reading in detail for your project.

Some of the x86-specific stuff won't be relevant - in particular, the step which translates x86 to a low-level intermediate representation is probably not necessary for MIPS (MIPS is essentially just one basic operation per opcode already) - but otherwise much of the content looks like it should be very useful.

Matthew Slattery
Oh my, this is excellent source of papers worse reading, I can't express how thankful I am (please, don't ask me how I was able to miss this :) )
Riz