views:

104

answers:

2

I build a CFG out of an arbitrary IL and want to convert that CFG back to IL. The order of the vertices in the CFG is of course not equal to the order of the original IL instructions.

This is fine but overcomplicates some stuff. Imagine:

Jump 'B'
'C': Return
'B': Jump 'C'

This would result in a flow graph like this: (Jump B) -> (Jump C) -> (Return) This is of course a simplified example but it shows the problem when converting out of the CFG.

Is there any information available on this topic in academia? I thought traversing the graph bottom-up would be very elegant but that does not work in more complicated cases.

A solution might be to walk top-down and search for a CF merge but in that case I would not be able to handle loops correct. So the only way to get this right seems to be to search for a possible CF merge if it occurs. If not, we have to have a loop, which means the loop is preferred and the continuing path is evaluated afterwards. This sounds like a solveable problem but it is also very expensive and there might exist a more elegant solution to the problem. Besides a loop could also result in a CF merge when thinking about a "break" statement.

A: 

This is easier than it sounds like. Basically one may get rid of any Jump statements in the CFG which results in an optimized graph. Jump statements will be inserted back once the graph is linearized. This does not keep the original order of instructions but results in a method with the same control flow.

Joa Ebert
+2  A: 

Converting CFG into IL: you want to walk over the graph, emitting each vertex exactly once (except those that are unreachable). So you need to record which vertices have been emitted: a flag on the vertex would do, or a hash from vertex to True/False.

Some vertices will have more than one successor, and you can only follow one of them directly; so you want a way to keep track of vertices that you want to come back to later. A queue is suitable for this.

This is more-or-less what I've used.

def needs_label(cfg, v, last):
    if cfg.predecessors(v) > 1:
        # There's more than one way of entering this vertex
        return True
    elif cfg.predecessors(v) == 1 and last != cfg.predecessors(v)[0]:
        # There's only one way, but the last vertex emitted was not that way
        # so it will be entered using a jump.
        return True
    else:
        return False

def emit_label(v):
    print 'label%d' % (v.id)

def emit_vertex(v):
    if v.type == 'branch':
        # Branch to second successor
        print 'br label%d' % cfg.successors(v)[1].id
    else:
        ...

def emit_jump(v):
    print 'jmp label%d' % v.id

def emit_cfg(cfg):
    q = Queue()   # Queue for saving vertices that we want to emit later
    done = {}    # Hash recording which vertices have already been emitted
    q.push(cfg.start())
    while not q.empty():
        v = q.pop()
        last = None
        while v is not None and not done[v]:
            # Emit the vertex, with a prefixed label if necessary
            if needs_label(cfg, v, last):
                emit_label(v)
            emit_vertex(v)
            done[v] = True
            last = v
            # Get the vertex's successors
            succs = cfg.successors(v)
            # If there aren't any, then this path is finished, so go back to
            # the outer loop to pop another saved vertex
            if len(succs) == 0:
                v = None   # Setting this will terminate the inner loop
                continue
            # Stick all the vertices on the stack for later, in case we can't
            # process them all here
            for s in succs:
                q.push(s)
            # Pick a new vertex from the list of successors.  Always pick the first
            # because if it's a branch then the second will have been branched on
            v = succs[0]
            # If it was emitted earlier we need to jump to it
            if done[v]:
                emit_jump(v)
                v = None
            # Otherwise continue the inner loop, walking from the new vertex

Treatment of branches (vertices with more than one successor) is pretty naive: normally you want to figure out which is more likely and follow that one directly, if possible.

Edmund
An instruction needs a label because it is the leader of a basicblock, which has no predecessor, or more than one predecessors, or only one predecessor but the predecessor has more than one successors.
forgot