views:

243

answers:

4

I was thinking back to my freshman year at college (five years ago) when I took an exam to place-out of intro-level computer science. There was a question about loop invariants, and I was wondering if loop invariants are really necessary in this case or if the question was simply a bad example... the question was to write an iterative definition for a factorial function, and then to prove that the function was correct.

The code that I provided for the factorial function was as follows:

public static int factorial(int x)
{
     if ( x < 0 ){
         throw new IllegalArgumentException("Parameter must be >= 0");
     }else if ( x == 0 ){
         return 1;
     }else{
         int result = 1;
         for ( int i = 1; i <= x; i++ ){
             result*=i;
         }
         return result;
     }
}

My own proof of correctness was a proof by cases, and in each I asserted that it was correct by definition (x! is undefined for negative values, 0! is 1, and x! is 1*2*3...*x for a positive value of x). The professor wanted me to prove the loop using a loop invariant; however, my argument was that it was correct "by definition", because the definition of "x!" for a positive integer x is "the product of the integers from 1... x", and the for-loop in the else clause is simply a literal translation of this definition. Is a loop invariant really needed as a proof of correctness in this case? How complicated must a loop be before a loop invariant (and proper initialization and termination conditions) become necessary for a proof of correctness?

Additionally, I was wondering... how often are such formal proofs used in the industry? I have found that about half of my courses are very theoretical and proof-heavy and about half are very implementation and coding-heavy, without any formal or theoretical material. How much do these overlap in practice? If you do use proofs in the industry, when do you apply them (always, only if it's complicated, rarely, never)?

Edit
If we, ourselves, are convinced that a piece of code is correct, can convince others (informally) that it is correct, and there are unit tests in place, to what extent are formal proofs of correctness needed?

A: 

In recent years "test driven development" under various names has been the furthest most people bother to go with reasoning about their code. It's rather like very careful and repeatable experimentation, versus logical reasoning. Science versus mathematics!

There is some use of pre-conditions, post-conditions and loop/class invariants in languages like Eiffel, and the forthcoming "contracts" support in .NET 4.0 may help to popularise these ideas further.

Personally I use assertions pretty infrequently these days; when I'm looping through a structure I usually don't write it as a loop any more. I write it as a query, e.g. Linq in C# or similar things in other languages like JS. So there is less imperative state manipulation to get wrong (usually there isn't any). And any assertion about the results would be redundant, as it would simply restate the conditions in the query: in the query approach, you describe the results you want.

This is not to say that I never use assertions; but I tend to use them in conjunction with a unit test, and only for very involved algorithms that perform some complicated "in-place" mutation of a collection; in such cases, there is no "built-in" way to ask for the results I want; I have to write the algorithm imperatively (maybe because it would be horribly expensive to copy the entire data structure), so I cover it with assertions in order to help my unit test flag up internal problems.

Daniel Earwicker
I agree that TDD is more cost-effective in practice than formal reasoning. It's also true, though, that knowing about formal reasoning will make you a better programmer.
xpmatteo
+1  A: 

When you're solving hard problems and writing code that will be reused long after you've moved on, you (should) go through the process of proving the correctness of every routine you write, every day. Test driven development is a formalization of that idea, but the core of it is: you need to prove at least to yourself and preferably to others (code review!) that the code you've written will handle all possible inputs and paths in an appropriate manner.

Do we bicker about code invariants? No. Do we grade papers before you can check in? Sort of. If the team isn't comfortable with your code or your "proof", you go back to your box to fix it until it passes review.

dthorpe
As an aside: loop invariant expressions are removed from the loop by most optimizing compilers these days. Write the code for readability first, nitty gritty minutia second.
dthorpe
@dthorpe, I fully agree that we should be convinced (and able to convince others) that the code is correct... I was wondering more about the degree of formality that is needed to achieve this.
Michael Aaron Safyan
+3  A: 
The professor wanted me to prove the loop using a loop invariant;

Your professor wanted to make sure you understood loop invariants, not just prove something about a very simple function.

Is a loop invariant really needed as a proof of correctness in this case?

Well, technically, no. By that reasoning, you don't need to write a factorial function, either: just use a library function! But that's not the point of the exercise.

How complicated must a loop be before a loop invariant (and proper initialization and termination conditions) become necessary for a proof of correctness?

I know some smart people who can probably prove just about anything without invariants, and then there's people who need to use them even for trivial cases like the above. That's like asking "how heavy does a rock have to be before you need a wheelbarrow to move it?".

Additionally, I was wondering... how often are such formal proofs used in the industry?

Written out explicitly? Probably rarely, unless you're in certain industries. But I still think about them when writing any but the most simple loop.

It's kind of like how I don't diagram sentences, but that doesn't mean I never think about grammar, especially if I'm writing some text that's really important. I can tell you what my pronoun's antecedent is, even though I'd never bother to put that fact on paper.

Ken
@Ken, the problem, to my recollection, did not specify how to prove the function correct. Would it have still been reasonable to assume/expect/require the use of an invariant in that case?
Michael Aaron Safyan
@Michael: It would have been reasonable to expect the use of a rigorous proof. It's not a difficult function, after all, and so you could use truly formal methods in the time needed for a test. I don't consider pointing at the function and claiming truth by definition to be a formal proof, myself, and obviously your professor didn't either. I think you were downgraded on that one fair and square.
David Thornley
@Ken, first of all there was no down grade; this was a placement exam and, in fact, I had never heard of loop invariants at the time (so it was ultimately useful as a teaching opportunity). However, my main question is where is the line between notation and algorithm such that proof by invariant becomes necessary / reasonable? To me, at least, it seems like the example problem is tantamount to proving that \Sigma_i=1^n f(n) is the same as f(1)+f(2)+....+f(n); I don't understand why it is even being proven -- it is just what we have defined the sigma notation to mean...
Michael Aaron Safyan
... by the same token, by the way that the language defines the semantics of for-loops, the for-loop given in the example above is just alternative notation for \Pi_i=1^x i, and that expression, in our familiar math notation, is easily recognizable as the definition of x!. I'm not sure I buy the "truly formal" argument... how formal is "truly formal"? Should one rederive and prove the contrapositive law every time one makes a proof by contradiction? Hence my question of where to draw the line between rigorous proof and insanity.
Michael Aaron Safyan
David: I think those comments were intended for you.
Ken
+3  A: 

to what extent are formal proofs of correctness needed?

It depends, of course, but I think it's important for programmers to know how to write code that is not prone to errors, where it tends to be correct by construction.

One example is the concept of "look-ahead", such as in parsing, where the next token of input is not "read", then "looked at", and then possibly "put back" if it is not what is wanted, but rather "looked at" and then possibly "accepted" if it is what is wanted. When, for example, writing loops to cycle through database records and extracting subtotals, this simple change in perspective can result in much simpler and more reliable code.

Another example is differential execution, a technique I stumbled on many years ago. It appears to allow any algorithm to be incrementally re-executed, so as to incrementally update its results. I use it extensively in user interfaces where the contents can dynamically change. For a long time, I felt that it worked in all cases, but couldn't be sure, until I finally proved it, as at the bottom of my Wikipedia page. After that, I knew that if I stuck to some simple constraints, I could rely on it to work, no matter how much code depended on it.

At the same time, we may have utmost confidence in the correctness of some algorithm, but find it very difficult to formally prove, because our proof techniques are poor. Consider the lowly bubble-sort. It obviously works, but try to prove it formally, by applying rules to source code. I've done it, but it is not easy. I haven't tried more advanced sorting algorithms.

Mike Dunlavey
@Mike, thank you for examples where loop invariant serves as a useful tool, and also +1 for being reasonable and accepting the notion of "correct by construction" in the trivially provable cases.
Michael Aaron Safyan