views:

30

answers:

3

I am using the ASM bytecode manipulation framework to perform static analysis on Java code. I wish to detect when fields of an object are reassigned, i.e. when this kind of code occurs:

class MyObject {
    private int value;
    void setValue(int newValue) { this.value = newValue; }
}

Using the following code (in a class implementing ClassVisitor) can detect the above situation:

@Override
public void visitFieldInsn(int opcode, String owner, String name, String desc) {
    if(opcode == Opcodes.PUTFIELD) {
        // do whatever here
    }
}

However, this code is called regardless of the object which owns the field. I would like to find the more specific case where the PUTFIELD operation is executed on the this object. For example, I want to distinguish between the first code snippet, and code such as this:

public MyObject createNewObjectWithDifferentField() {
    MyObject newObject = new MyObject();
    newObject.value = 43;
    return newObject;
}

In the above case, the PUTFIELD operation is still executed, but here it's on a local variable (newObject) rather than the this object. This will depend on the state of the stack at the time of the assignment, but I have came across a few different scenarios where the bytecode is totally different, and I'm looking for ways to handle this complexity.

How do I check that PUTFIELD is reassigning a field belonging to this object?


Edit

I'm using ASM to perform analysis only, rather than instrumenting existing bytecode. Preferably I'd like to find a way of discovering this without altering the bytecode, if possible.

A: 

I think that in general case it's impossible. Consider:

class MyObject {
  private int value;
  void mymethod1() {
    mymethod2(Math.random() > 0.5 ? this : new MyObject());
  }

  void mymethod2(MyObject that) {
    that.value = 1;
  }
}

In simpler cases you can track the stack back to ALOAD 0, which in an instance method refers to this.

Jevgeni Kabanov
I think you're right in that there will probably always be a way to trick my analysis, but the more esoteric the code gets, the less I'm worried about it :) In terms of tracking `ALOAD_0`, is the `this` reference *guaranteed* to be in position zero of the local variable table? I was under the impression that it was usually the case, but not always...
Grundlefleck
Yes it is guaranteed for instance methods and constructors. Check the JVMS section 4.9.4
HH
ASM includes a verifying analyser, which you should be able to use to track back the stack to ALOAD_0. That's all there is to it inside one method. Between methods, it's a different story.
Jevgeni Kabanov
A: 

I've never used ASM, however, I have experience with bytecode manipulation.

Right before the PUTFIELD instruction, the stack looks like this:

|...,object_ref,value

or

|...,object_ref,value1,value2 (if the type of the field is double or long)

Taking the first case, you can insert the following instructions before the PUTFIELD:

1: DUP2
2: POP
3: ALOAD_0
4: IF_ACMPNE X
5: put your code here
...
...
X: PUTFIELD

Instruction (1) duplicates the object_ref and the value on the stack. (2) removes the value. (3) loads the 'this' reference. (4) If 'this' is equal to the object_ref execute your code, else do nothing and jump to the PUTFIELD.

For the second case (long or double field) you can use this series of bytecode instructions

1: DUP2_X1
2: POP2
3: DUP
4: ALOAD_0
5: IF_ACMPNE 7
6: put your code here
...
...
7: DUP_X2
8: POP
9: PUTFIELD
HH
Thanks for your insight into the state of the stack. However, I'm using ASM to perform static analysis, your solution involves both manipulating the bytecode (which I'd rather not do) and executing the code (which I definitely can't do). Thanks though, particularly for pointing out the difference in the stack with longs, doubles vs. other data types.
Grundlefleck
Yep, I missed the static analysis part. Most of my work was on dynamic analysis, so this solution just popped out in my brain. In this case I agree with Jevgeni's answer.
HH
A: 

An alternative approach (runtime):

You could use AspectJ and set up field set/get pointcuts for your Class. See: http://www.eclipse.org/aspectj/doc/released/progguide/semantics-pointcuts.html and http://www.eclipse.org/aspectj/.

After defining your pointcuts, you'd write some advice that simply prints out the current location of execution by using the thisJoinPoint variable. Then, when running your program, you'd have a nice log of everywhere the fields were get/set.

This would require either runtime or compile time weaving which means bytecode manipulations either way. Hope this helps...

Andy
Unfortunately, as I'm analysing the code statically, without executing it, this approach won't work. Thanks anyway :)
Grundlefleck
Aah, well, good luck with the ASM approach. Do you have the source or are you analyzing a binary?
Andy