views:

1115

answers:

13

I'm somewhat new to .NET but not new to programming, and I'm somewhat puzzled at the trend and excitement about disassembling compiled .NET code. It seems pointless.

The high-level ease of use of .NET is the reason I use it. I've written C and real (hardware processor) assembly in environments with limited resources. That was the reason to spend the effort on so many meticulous details, for efficiency. Up in .NET land, it kind of defeats the purpose of having a high-level object-oriented language if you waste time diving down into the most cryptic details of the implementation. In the course of working with .NET, I have debugged the usual performance issues an odd race conditions, and I've done it all by reading my own source code, never once having any thought as to what intermediate language the compiler is generating. For example, it's pretty obvious that a for(;;) loop is going to be faster than a foreach() on an array, considering that foreach() is going to use an enumeration object with a method call to advance to each next time instead of a simple increment of a variable, and this is easy to prove with a tight loop run a few million times (no disassembly required).

What really makes disassembling IL silly is the fact that's it's not real machine code. It's virtual machine code. I've heard some people actually like to move instructions around to optimize it. Are you kidding me? Just-in-time compiled virtual machine code can't even do a simple tight for(;;) loop at the speed of natively compiled code. If you want to squeeze every last cycle out of your processor, then use C/C++ and spend time learning real assembly. That way the time you spend understanding lots of low-level details will actually be worthwhile.

So, other than having too much time on their hands, why do people disassemble .NET (CLR) binaries?

+5  A: 

Understanding what compilers for various high-level languages are actually doing with your sources is an important skill to acquire as you move towards mastery of a certain environment, just like, say, understanding how DB engines will plan to execute various kinds of SQL queries you can toss at them. To use in a masterful way a certain level of abstraction, familiarity with (at least) the level below it is quite a good thing to acquire; see e.g. some notes on my talk on the subject of abstraction and the slides for that talk, as well as Joel Spolsky's "law of leaky abstractions" that I refer to in the talk.

Alex Martelli
I'm all for understanding how things work behind the scenes, but now that you mention DB engines, the first several SQL queries I ever wrote I made run faster by several orders of magnitude simply by looking at them and thinking about the only logical algorithms the database engine must use to interpret them. For example, it doesn't take a trace or database engine source code to realize that a delete of a row that's referenced from a non-indexed table is going to be slower than a delete of a row that's referenced from an indexed table. Good design is common sense.
@unknown (yahoo) - Trust, but verify.
TheSoftwareJedi
I do not believe this is a valid argument because it would be better to use the provided MSIL Disassembler (Ildasm.exe) and not .NET Reflector.
AMissico
A: 

I myself often wonder this... :)

Sometimes there is a need to understand how a specific library method works or why exactly it works this way. There maybe a situation when the documentation on this function is vague or there is some odd behavior that needs investigation. In this case some people go to disassemble libraries to look what calls inside certain methods are made.

As for optimization I never heard of this. I think it is ultimately stupid trying to optimize MIL, since it will be then fed to a translator which will generate the real machine code with a pretty good efficiency and your "optimizations" could get lost anyway.

User
I heard about optimization from only this one web site I didn't bother to bookmark and can't find again, and it really made me wonder how that would be practical. Would the person modify the IL by hand whenever a new version of the source was released? It made me wonder whether people write inline IL assembler! Actually, check out this site:http://www.partario.com/blog/2009/04/inline-il-assembly.htmlGeez. .NET is for rapid application development, isn't it? Which is faster to read/write, IL or C#/VB.NET?
A: 

Each .NET language implements its own subset of CLR functionality. Knowing that the CLR is capable of things that the language you're currently using isn't can let you make an informed decision on whether to change languages or emit IL or find another way.

Your assumption that the only reason people do things like this is because they have too much time is insulting and uneducated.

overslacked
I believe the author is correct in his assumption. You will only do disassembling if your really need to get inside it or if you are really curious (for which you need time). If something didn't work out of the box I would rather try a different approach with something than spending unpredictable amounts of time trying to disassemble stuff. Most people disassembling libraries, patching executables, modifying firmware for cell phones etc. are either very curious and having too much time or hackers expecting some profit to make.
User
@Mastermind, I see what you mean. I'm not just referring to disassembling though, I'm also speaking of having an understanding of the CLR as a separate entity than any particular .NET language. Maybe that seems beside the point, but I think it's important.
overslacked
What I'm questioning is the purpose of the CLR vs. the purpose of the .NET classes and the languages that leverage them. Having a high-level abstraction that can be ported to different operating systems is useful, but the more low-level details are inroduced into that abstraction the less it becomes useful to the average human being.If there are truly differences in CLR support across different .NET languages, then I think .NET is failing. It's simply making the world just as complicated as it used to be, except burning more CPU cycles and memory doing it.
@unknown - The CLR is, as you've said, a VM. However, its capabilities are not tied specifically to any of the .NET languages (just as traditional hardware instruction sets are not tied to high-level languages) - it's up to the compilers for each language to produce CLR-compatible MSIL. I consider this evidence of a robust architecture, and I'm surprised that you don't appreciate it more, given your hardware background.
overslacked
+1  A: 

I've used it when the source code has been lost or what's in version control in a particular tagged release doesn't appear to correspond to the shipped binary.

Cade Roux
I've also decompiled in the past to find out how some encryption algos were applied.
tom
A: 

After just completing a 4 day course in secure software development, I would say that many people would decompile source to find any vulnerabilities in it. Knowing the source of a client application could help in planning an attack on a server.

Of course, little utilities and such, there wouldn't be any such issues.

If i remember correctly, there is an app out there that obfuscates your .net binaries. I believe it was called dotfuscator.

KFro
A: 

To locate library bugs and figure out how to work around them.

For example: without reflection you cannot remote an exception and rethrow it without slaughtering its backtrace. However the framework can do it.

Joshua
A: 

To understand how the underlying system is implemented, understand what's the equivalent of a high level code in IL, circumvent licensing...

Mehrdad Afshari
A: 

From your question it looks like you do not know that Reflector disassembles CLR assemblies back to C# or VB so you pretty much see original code, not IL!

zvolkov
You see IL in code. The code you see can be considerably different than the original code.
AMissico
A: 

I have used it in the following, an more, cases:

  1. Had trouble with an internal assembly to which I did not have the source code for.
  2. Needed to figure out how a particular third-party controls library looks for a run-time license.
  3. Needed to find out how the .Net license compiler works. (Just placed lc.exe inside Reflector)
  4. Used it to make sure I had the correct build of certain libraries.
siz
It sounds like you would have been helped if the people that produced a couple of those executables simply gave you the source code in a .zip file. Why would those people not release their source when they know you can easily view it with Reflector anyway?
Of course. But this was a situation in which the assembly was old and it would have taken me time to look up who created the assembly, whether they still worked at the company, where the source code could be, etc. It is just easier to open up Reflector and drag the assembly in there. It's a matter on convenience for me, more than necessity.
siz
A: 

To understand how to use a poorly documented interface.

(sadly it's much too frequent in .net based tools such as BizTalk or WCF to only have generic generated documentation, so disassembling to C# is sometimes necessary to see what a method is doing, in which context to use it)

ckarras
Hah! That's funny. Isn't one of the goals of object-oriented programming to "separate the interface from the implementation" ?
I wouldn't say that it's a goal specific to OO. OO provides tools that can help in making understandable and convenient interfaces, but it has to be used with that goal in my mind (using OO won't automatically guarantee a self-documenting interface). So the need for Reflector as a substitute for documentation is most certainly a sign of a flawed interface, but as long as these flawed interfaces exist, it's a necessary evil.
ckarras
A: 

Actually, a foreach over an int[] gets compiled into a for statement. If we cast it to an enumerable, you are right, it uses an Enumerator. HOWEVER, that strangely makes it FASTER since there is no incrementing the temp int. To prove this, we use benchmarking coupled with the decompiler for added understanding...

So I think by asking this question, you really answered it yourself.

If this benchmark differs from yours, please let me know how. I tried it with object arrays, nulls, etc, etc...

code:

    static void Main(string[] args)
    {

        int[] ints = Enumerable.Repeat(1, 50000000).ToArray();

        while (true)
        {
            DateTime now = DateTime.Now;
            for (int i = 0; i < ints.Length; i++)
            {
                //nothing really
            }
            Console.WriteLine("for loop: " + (DateTime.Now - now));

            now = DateTime.Now;
            for (int i = 0; i < ints.Length; i++)
            {
                int nothing = ints[i];
            }
            Console.WriteLine("for loop with assignment: " + (DateTime.Now - now));

            now = DateTime.Now;
            foreach (int i in ints)
            {
                //nothing really
            }
            Console.WriteLine("foreach: " + (DateTime.Now - now));

            now = DateTime.Now;
            foreach (int i in (IEnumerable<int>)ints)
            {
                //nothing really
            }
            Console.WriteLine("foreach casted to IEnumerable<int>: " + (DateTime.Now - now));
        }

    }

results:

for loop: 00:00:00.0273438
for loop with assignment: 00:00:00.0712890
foreach: 00:00:00.0693359
foreach casted to IEnumerable<int>: 00:00:00.6103516
for loop: 00:00:00.0273437
for loop with assignment: 00:00:00.0683594
foreach: 00:00:00.0703125
foreach casted to IEnumerable<int>: 00:00:00.6250000
for loop: 00:00:00.0273437
for loop with assignment: 00:00:00.0683594
foreach: 00:00:00.0683593
foreach casted to IEnumerable<int>: 00:00:00.6035157
for loop: 00:00:00.0283203
for loop with assignment: 00:00:00.0771484
foreach: 00:00:00.0771484
foreach casted to IEnumerable<int>: 00:00:00.6005859
for loop: 00:00:00.0273438
for loop with assignment: 00:00:00.0722656
foreach: 00:00:00.0712891
foreach casted to IEnumerable<int>: 00:00:00.6210938

decompiled (note that the empty foreach had to add a variable assignment... something our empty for loop didn't but obviously needed):

private static void Main(string[] args)
{
    int[] ints = Enumerable.Repeat<int>(1, 0x2faf080).ToArray<int>();
    while (true)
    {
        DateTime now = DateTime.Now;
        for (int i = 0; i < ints.Length; i++)
        {
        }
        Console.WriteLine("for loop: " + ((TimeSpan) (DateTime.Now - now)));
        now = DateTime.Now;
        for (int i = 0; i < ints.Length; i++)
        {
            int num1 = ints[i];
        }
        Console.WriteLine("for loop with assignment: " + ((TimeSpan) (DateTime.Now - now)));
        now = DateTime.Now;
        int[] CS$6$0000 = ints;
        for (int CS$7$0001 = 0; CS$7$0001 < CS$6$0000.Length; CS$7$0001++)
        {
            int num2 = CS$6$0000[CS$7$0001];
        }
        Console.WriteLine("foreach: " + ((TimeSpan) (DateTime.Now - now)));
        now = DateTime.Now;
        using (IEnumerator<int> CS$5$0002 = ((IEnumerable<int>) ints).GetEnumerator())
        {
            while (CS$5$0002.MoveNext())
            {
                int current = CS$5$0002.Current;
            }
        }
        Console.WriteLine("foreach casted to IEnumerable<int>: " + ((TimeSpan) (DateTime.Now - now)));
    }
}
TheSoftwareJedi
A: 

Something that folks haven't mentioned is that reflector comes in super useful if you use a compile time weaving AOP framework like PostSharp.

Egor
A: 

To learn.

Articles are nice, but they do not present production code. Without .NET Reflector, it would have taken me a couple of weeks to figure out how Microsoft implemented events in the FileSystemWatcher component. Instead, it only a few hours and I was able to finish my FileSystemSearcher component.

AMissico