views:

161

answers:

6

Is it better to declare a variable used in a loop outside of the loop rather then inside? Sometimes I see examples where a variable is declared inside the loop. Does this effectively cause the program to allocate memory for a new variable each time the loop runs? Or is .NET smart enough to know that it's really the same variable.

For example see the code below from this answer.

public static void CopyStream(Stream input, Stream output)
{
    byte[] buffer = new byte[32768];
    while (true)
    {
        int read = input.Read (buffer, 0, buffer.Length);
        if (read <= 0)
            return;
        output.Write (buffer, 0, read);
    }
}

Would this modified version be any more efficent?

public static void CopyStream(Stream input, Stream output)
{
    int read; //OUTSIDE LOOP
    byte[] buffer = new byte[32768];
    while (true)
    {
        read = input.Read (buffer, 0, buffer.Length);
        if (read <= 0)
            return;
        output.Write (buffer, 0, read);
    }
}
+6  A: 

No, it wouldn't be more efficient. However, I'd rewrite it this way which happens to declare it outside the loop anyway:

byte[] buffer = new byte[32768];
int read;
while ((read = input.Read(buffer, 0, buffer.Length)) > 0)
{
    output.Write(buffer, 0, read);
}

I'm not generally a fan of using side-effects in conditions, but effectively the Read method is giving you two bits of data: whether or not you've reached the end of the stream, and how much you've read. The while loop is now saying, "While we've managed to read some data... copy it."

It's a little bit like using int.TryParse:

if (int.TryParse(text, out value))
{
    // Use value
}

Again you're using a side-effect of calling the method in the condition. As I say, I don't make a habit out of doing this except for this particular pattern, when you're dealing with a method returning two bits of data.

The same thing comes up reading lines from a TextReader:

string line;
while ((line = reader.ReadLine()) != null)
{
    ...
}

To go back to your original question: if a variable is going to be initialized in every iteration of a loop and it's only used within the body of the loop, I'd almost always declare it within the loop. One minor exception here is if the variable is being captured by an anonymous function - at that point it will make a difference in behaviour, and I'd pick whichever form gave me the desired behaviour... but that's almost always the "declare inside" form anyway.

EDIT: When it comes to scoping, the code above does indeed leave the variable in a larger scope than it needs to be... but I believe it makes the loop clearer. You can always address this by introducing a new scope if you care to:

{
    int read;
    while (...)
    {
    }
}
Jon Skeet
I have to agree with the scope issue. When someone is having to read old code (or someone else's code), it is nice to have variable with proper scope. When I leave the scope, I no longer have to worry about what the last value of that variable might be.
Torlack
@Torlack: I'll edit to address this.
Jon Skeet
+1  A: 

I've generally preferred the latter as a matter of personal habit because, even if .NET is smart enough, other environments in which I might work later may not be smart enough. It could be nothing more than compiling down to an extra line of code inside the loop to re-initialize the variable, but it's still overhead.

Even if they're identical for all measurable purposes in any given example, I would say the latter has less of a chance of causing problems in the long run.

David
I disagree - because you've now got a variable with a larger scope than it needs, which is generally bad for readability. Personally I think you should adapt to the idioms of the environment you're working in - if you try to code C++, Java and C# all the same way, you'll end up with bigger problems than this.
Jon Skeet
Fair enough, and I definitely agree that one should use the tools properly rather then try to force them to look alike. I certainly wouldn't want to imply otherwise. I guess in this particular example the scope isn't a big deal because it ends immediately afterward anyway. There's definitely a lot to be said for the context of the code as a whole, as there's very rarely a global solution for all instances of similar problems.
David
+1  A: 

It really doesn't matter, and if I was reviewing the code for that particular example, I wouldn't care either way.

However, be aware that the two can mean very different things if you end up capturing the 'read' variable in a closure.

See this excellent post from Eric Lippert where this issue comes up regarding foreach loops - http://blogs.msdn.com/b/ericlippert/archive/2009/11/12/closing-over-the-loop-variable-considered-harmful.aspx

Alex Humphrey
+3  A: 

In the unlikely environment that doesn't help you with this, it would still be a micro-optimization. Factors like clarity and proper scoping is much more important than the edge case where this might just make next to no difference.

You should give your variables proper scope without thinking about performance. Of course, complex initializations are a different beast, so if something should only be initialized once but is only used within a loop, you'd still want to declare it outside.

+1 Even if the variable is allocated with each iteration of the loop, you'd probably want to declare it inside. The performance difference is most of the time negligible, but scoping isn't.
Helper Method
I agree, I was mostly talking about variables that are used but not changed inside the loop, but declared and initialized outside the loop, because the initialization of the object is non-trivial. As Jon Skeet mentioned in his answer, you can introduce a new scope to keep it outside the loop but still with proper scoping. This works very well with stuff like resource handles, in which case you can introduce a new scope with the using(...) construct.
+2  A: 

I am going to agree with most of these other answers with a caveat.

If you are using lambada expressions you must be careful with capturing variables.

static void Main(string[] args)
{
    var a = Enumerable.Range(1, 3);
    var b = a.GetEnumerator();
    int x;
    while(b.MoveNext())
    {
        x = b.Current;
        Task.Factory.StartNew(() => Console.WriteLine(x));
    }
    Console.ReadLine();
}

will give the result

3
3
3

Where

static void Main(string[] args)
{
    var a = Enumerable.Range(1, 3);
    var b = a.GetEnumerator();
    while(b.MoveNext())
    {
        int x = b.Current;
        Task.Factory.StartNew(() => Console.WriteLine(x));
    }
    Console.ReadLine();
}

will give the result

1
2
3

or some order there of. This is because when the task finally starts it will check the current value of it's reference to x. in the first example all 3 loops pointed at the same reference, where in the second example they all pointed at different references.

Scott Chamberlain
+1  A: 

As is the case with lots of simple optimizations like this, the compiler takes care of it for you. If you try both of these and look at the assemblies' IL in ildasm you can see that they both declare a single int32 read variable, although it does reorder the declarations:

  .locals init ([0] int32 read,
           [1] uint8[] buffer,
           [2] bool CS$4$0000)

  .locals init ([0] uint8[] buffer,
           [1] int32 read,
           [2] bool CS$4$0000)
John Bowen