How to determine which classes are referenced in a compiled .Net or Java application?

views:

answers:

+1 Q:

How to determine which classes are referenced in a compiled .Net or Java application?

I wonder if there's an easy way to determine which classes from a library are "used" by a compiled .NET or Java application, and I need to write a simple utility to do that (so using any of the available decompilers won't do the job).

I don't need to analyze different inputs to figure out if a class is actually created for this or that input set - I'm only concerned whether or not the class is referenced in the application. Most likely the application would subclass from the class I look for and use the subclass.

I've looked through a bunch of .Net .exe's and Java .classes with a hex editor and it appears that the referenced classes are spelled out in plaintext, but I am not sure if it will always be the case - my knowledge of MSIL/Java bytecode is not enough for that. I assume that even though the application itself can be obfuscated, it'll still have to call the library classes by the original name?

+1 A:

For .NET: it looks like there's an article on MSDN that should help you get started. For what it's worth, for .NET the magic Google words are ".net assembly references".

overslacked 2009-09-02 17:50:31

+2 A:

Extending what overslacked said.

EDIT: For some reason I thought you asked about methods, not types.

Types

Like finding methods, this doesn't cover access through the Reflection API.

You have to locate the following in a Reflector plugin to identify referenced types and perform a transitive closure:

Method parameters
Method return types
Custom attributes
Base types and interface implementations
Local variable declarations
Evaluated sub-expression types
Field, property, and event types

If you parse the IL yourself, all you have to do is process from the main assembly is the TypeRef and TypeSpec metadata, which is pretty easy (of course I'm speaking from parsing the entire byte code here). However, the transitive closure would still require you process the full byte code of each referenced method in the referenced assembly (to get the subexpression types).

Methods

If you can write a plugin for Reflector to handle the task, it will definitely be the easiest way. Parsing the IL is non-trivial, though I've done it now so I would just use that code if I had to (just saying it's not impossible). :D

Keep in mind that you may have method dependencies you don't see on the first pass that neither method mentioned will catch. These are due to indirect dispatch via the callvirt (virtual and interface method calls) and calli (generally delegates) instructions. For each type T created with newobj and for each method M within the type, you'll have to check all callvirt, ldftn, and ldvirtftn instructions to see if the base definition for the target (if the target is a virtual method) is the same as the base method definition for M in T or M is in the type's interface map if the target is an interface method. This is not perfect, but it is about the best you can do for static analysis without a theorem prover. It is a superset of the actual methods that will be called outside of the Reflection API, and a subset of the full set of methods in the assembly(ies).

280Z28 2009-09-02 18:03:47

Thanks, but isn't that overly complex for the task? I don't need to know _how_ the class is being used, e.g. I don't care about what are the subclasses. The class to test for belongs to a library that is constant so if there was a line like public class userClass1 : myCoolClasssin the source code, wouldn't I always find a string "myCoolClass" in the compiled executable?

7macaw 2009-09-02 20:18:19

If you never use an array (`SomeType[]`) in your code, but you have an instance of `List<SomeType>`, do you count the fact that `List<SomeType>` has a private member field of type `SomeType[]`? In this case, neither your code nor assembly references `SomeType[]`, yet both are dependent on it.

280Z28 2009-09-02 23:59:29

In Java, the best mechanism to find class dependencies (in a programmatic fashion) is through bytecode inspection. This can be done with libraries like BCEL or (preferably) ASM. If you wish to parse the class files with your own code, the class file structure is well documented in the Java VM specification.

Note that class inspection won't cover runtime dependencies (like classes loaded using the service API).

McDowell 2009-09-02 22:26:28

ansaurus

tags:

views:

answers:

How to determine which classes are referenced in a compiled .Net or Java application?

Types

Methods

related questions