I have some MSIL in byte format (result of reflection's GetMethodBody()) that I'd like to analyze a bit. I'd like to find all classes created with the new operator in the MSIL. Any ideas on how to do that programmatically?
You could take a look at the engine behind tools like FxCop. It's named CCI. Or check out the one from Mono, named Cecil, on which Gendarme is based. They are build for these (and other) kind of tasks.
Check out this article on codeproject http://www.codeproject.com/KB/cs/sdilreader.aspx
Use the source code that will give you ability to take the IL byte[] into a list of instructions. If you are dealing with Generic, you may wants to scroll through the messages and check a post that I put in that article (Bug Fix for Generic) that fixed some bugs related to using with Generic (only when you want to turn the IL into display text).
Once you have all the IL Instructions, all you need is to loop through them and increment the count whenever the opcode of the instruction (instruction.code) match up with OpCodes.Newobj or Newarr.
If you want to gain more understanding on the internal of MSIL, I strongly recommend the book "Compiling for the .NET CLR" by John Gough.
I ended up using the MSIL parser here: http://blogs.msdn.com/zelmalki/archive/2008/12/11/msil-parser.aspx, with the source slightly modified to work on ConstructorInfo as well as MethodInfo (results returned from reflector).
It will give a list of operations, with the opcode and parameters. The opcode is an enum, based on that value the parameters can be interpreted. The parameters are in binary form, need to used MethodInfo.Module.Resolve*() to get the actual parameter values.
using System;
using System.Collections.Generic;
using System.IO;
using System.Reflection;
using System.Reflection.Emit;
using System.Text;
namespace AspnetReflection
{
public class MsilReader
{
static readonly Dictionary<short, OpCode> _instructionLookup;
static readonly object _syncObject = new object();
readonly BinaryReader _methodReader;
MsilInstruction _current;
Module _module; // Need to resolve method, type tokens etc
static MsilReader()
{
if (_instructionLookup == null)
{
lock (_syncObject)
{
if (_instructionLookup == null)
{
_instructionLookup = GetLookupTable();
}
}
}
}
public MsilReader(MethodInfo method)
{
if (method == null)
{
throw new ArgumentException("method");
}
_module = method.Module;
_methodReader = new BinaryReader(new MemoryStream(method.GetMethodBody().GetILAsByteArray()));
}
public MsilReader(ConstructorInfo contructor)
{
if (contructor == null)
{
throw new ArgumentException("contructor");
}
_module = contructor.Module;
_methodReader = new BinaryReader(new MemoryStream(contructor.GetMethodBody().GetILAsByteArray()));
}
public MsilInstruction Current
{
get { return _current; }
}
public bool Read()
{
if (_methodReader.BaseStream.Length == _methodReader.BaseStream.Position)
{
return false;
}
int instructionValue;
if (_methodReader.BaseStream.Length - 1 == _methodReader.BaseStream.Position)
{
instructionValue = _methodReader.ReadByte();
}
else
{
instructionValue = _methodReader.ReadUInt16();
if ((instructionValue & OpCodes.Prefix1.Value) != OpCodes.Prefix1.Value)
{
instructionValue &= 0xff;
_methodReader.BaseStream.Position--;
}
else
{
instructionValue = ((0xFF00 & instructionValue) >> 8) |
((0xFF & instructionValue) << 8);
}
}
OpCode code;
if (!_instructionLookup.TryGetValue((short) instructionValue, out code))
{
throw new InvalidProgramException();
}
int dataSize = GetSize(code.OperandType);
var data = new byte[dataSize];
_methodReader.Read(data, 0, dataSize);
_current = new MsilInstruction(code, data);
return true;
}
static int GetSize(OperandType opType)
{
int size = 0;
switch (opType)
{
case OperandType.InlineNone:
return 0;
case OperandType.ShortInlineBrTarget:
case OperandType.ShortInlineI:
case OperandType.ShortInlineVar:
return 1;
case OperandType.InlineVar:
return 2;
case OperandType.InlineBrTarget:
case OperandType.InlineField:
case OperandType.InlineI:
case OperandType.InlineMethod:
case OperandType.InlineSig:
case OperandType.InlineString:
case OperandType.InlineSwitch:
case OperandType.InlineTok:
case OperandType.InlineType:
case OperandType.ShortInlineR:
return 4;
case OperandType.InlineI8:
case OperandType.InlineR:
return 8;
default:
return 0;
}
}
static Dictionary<short, OpCode> GetLookupTable()
{
var lookupTable = new Dictionary<short, OpCode>();
FieldInfo[] fields = typeof (OpCodes).GetFields(BindingFlags.Static | BindingFlags.Public);
foreach (FieldInfo field in fields)
{
var code = (OpCode) field.GetValue(null);
lookupTable.Add(code.Value, code);
}
return lookupTable;
}
}
public struct MsilInstruction
{
public readonly byte[] Data;
public readonly OpCode Instruction;
public MsilInstruction(OpCode code, byte[] data)
{
Instruction = code;
Data = data;
}
public override string ToString()
{
var builder = new StringBuilder();
builder.Append(Instruction.Name + " ");
if (Data != null && Data.Length > 0)
{
builder.Append("0x");
foreach (byte b in Data)
{
builder.Append(b.ToString("x2"));
}
}
return builder.ToString();
}
}
}
I've also found the code Frank found to be very useful but it does have one problem, a switch opcode is not processed correctly.
From MSDN, the opcode is followed by an int32 containing the number of items in the jump table and then the positions to jump to. So a switch with 3 items actually has 16 data bytes not 4.
I'm using the code from Ziad Elmalki's second post on the subject that includes a GetData method to identify things like the target of a method call.
I corrected the processing of switch opcodes by changing the handling them in GetData to look more like this:
case OperandType.InlineSwitch:
{
int numberOfCases = BitConverter.ToInt32(rawData, 0);
int[] caseAddresses = new int[numberOfCases];
byte[] caseData = new byte[4];
for (int i = 0; i < numberOfCases; i++)
{
_methodReader.Read(caseData, 0, caseData.Length);
caseAddresses[i] = BitConverter.ToInt32(caseData, 0);
}
data = caseAddresses;
}
break;