views:

173

answers:

10

I have used Java, C++, .Net. (in that order). When asked about by-value vs. by-ref on interviews, I have always done well on that question ... perhaps because nobody went in-depth on it. Now I know that I do not see the whole picture.

I was looking at this section of code written by someone else:

XmlDocument doc = new XmlDocument();
AppendX(doc); // Real name of the function is different
AppendY(doc); // ditto

When I saw this code, I thought: wait a minute, should not I use a ref in front of doc variable (and modify AppendX/Y accordingly? it works as written, but made me question whether I actually understand the ref keyword in C#.

As I thought about this more, I recalled early Java days (college intro language). A friend of mine looked at some code I have written and he had a mental block - he kept asking me which things are passed in by reference and when by value. My ignorant response was something like: Dude, there is only one kind of arg passing in Java and I forgot which one it is :). Chill, do not over-think and just code.

Java still does not have a ref does it? Yet, Java hackers seem to be productive. Anyhow, coding in C++ exposed me to this whole by reference business, and now I am confused.

Should ref be used in the example above?

I am guessing that when ref is applied to value types: primitives, enums, structures (is there anything else in this list?) it makes a big difference. And ... when applied to objects it does not because it is all by reference. If things were so simple, then why would not the compiler restrict the usage of ref keyword to a subset of types.

When it comes to objects, does ref serve as a comment sort of? Well, I do remember that there can be problems with null and ref is also useful for initializing multiple elements within a method (since you cannot return multiple things with the same easy as you would do in Python).

Thanks.

+4  A: 

Object types are always passed by reference (actually the reference is passed by value). Integral types may be passed by reference or value.

Jamie Ide
The parenthesized bit is correct - but the bit before it isn't.
Jon Skeet
@Jon Skeet: It's the same thing, right? If you pass a reference to an object, then logically you're passing the object by reference. `ref` passes a *variable* by reference, not the object that the variable is referencing. (For the record, I have a precise understanding of what using `ref` does, and of the passing semantics when not using `ref`. I'm just wondering about your terminology.)
Joren
@Joren: No, passing by reference isn't the same thing as passing a reference. "Pass by reference" semantics imply that changes made to the parameter itself are made to the argument. Now in this case talking about an object itself being passed is inaccurate to start with, because in Java and C# there is no expression whose type is *actually* an object - it's *only* ever a reference. I've found it's a lot easier to explain this topic without using the idea of passing an actual *object* at all.
Jon Skeet
@Jon: Let's not talk about variables at all, but arbitrary expressions. If I call a method with some expression of type S, with S a struct, then the value of that expression is 'passed by value'. Since S is a value type, the object ('object' defined as 'instance of S') is thus passed by value. If I call the method with some expression of type C, with C a class, then the value of the expression is still passed by value. But since C is a reference type, the value in this case is a reference to an object. That object is then handled by use of a reference, which I would call 'passed by reference'.
Joren
I think we both agree that this is completely a different thing from using `ref`. (What `ref` does is also a far less general discussion as `ref` can only be used on variables, creating a form of expression that's of a completely different kind) That is why I'd say that `ref` is called 'ref' because it creates a reference to a *variable*, as opposed to saying anything about values, objects, etc. I particularly like Eric Lippert's explanation of the behaviour, and mentally pretend the word 'ref' is a synonym for 'alias'.
Joren
@Joren: You said it yourself: the value of the expression is still passed by value. I find it leads to a *much* simpler and more consistent mental model to stick with that, rather than sort of go "in and out" in terms of de-referencing and re-referencing by talking about the object being passed by reference. That way causes more confusion when you have `ref` on a reference type parameter - it leads people to believe it makes no difference.
Jon Skeet
@Jon: I think that in an 'ideal world' we wouldn't talk about variables being passed by value or by reference at all. And we wouldn't talk about values of expressions being passed by value or by reference either, because they can only sensibly be passed by value. But we would say that objects can be passed either by reference or by value, symmetric to how we name their types. In this world I think my model would be simple enough. But we don't live in that world, so – especially for educational purposes – I can see what you're coming from here. Thanks for indulging me in this discussion.
Joren
-1 for claiming object types are passed by reference. This is not a simplification of a reference being passed by value, it is completely different.
Brian
+2  A: 

No, don't use the ref keyword. You want to use the content of the doc, not change doc itself. See this post for further explenation: ref keyword

Femaref
+3  A: 

A method that accepts a reference type variable passed by ref may change the thing the variable is pointing to - not just modify it.

Jeff Sternal
For a one-liner, that's pretty good.
Hamish Grubijan
+2  A: 

That depends on what you want AppendX to do. If it is modifying the contents of the object, it does not need to be passed by ref. If you wish AppendX to be able to change which object the variable "doc" points to, you need to. "doc" is already a "reference type", which is equivelant to it being a pointer to an object in c++.

csauve
+1  A: 

http://www.albahari.com/valuevsreftypes.aspx has a great explanation of the difference between Reference Types and Value Types in .Net.

TreDubZedd
+9  A: 

You're not alone in being confused.

By default, everything is passed by value in C# - but in the case of a reference type (such as XmlDocument) that value is a reference.

The ref keyword is used to indicate that a parameter is "pass by reference" and it has to be specifed at the call site as well. Java doesn't have any equivalent - everything in Java is passed by value.

See my article on parameter passing for a lot more detail.

Jon Skeet
I know you like feedback, so I thought your article was good but it is "sand-boxed" to .Net (and Java I guess) world. The other article makes references to C++, opens a can of worms and that is what I was seeking with this particular question. Before thinking that it is your fault, consider this: Richard Feynman, a bright guy who gave many lectures and had a passion for teaching and explaining admitted that there is no single teaching method, because the studentry is so diverse in ways they think and absorb. What worked for his son did little for his daughter. Too bad I can't find the link.
Hamish Grubijan
@Hamish: Yes, my article is very deliberately specific to .NET. As you say, there are lots of cans of worms in other languages. From the point of view of trying to teach someone the behaviour of C#, I find it's best to leave those cans closed :)
Jon Skeet
+1  A: 

I argued this with some coworkers and ultimately lost; here's how I learned to do things their way.

The ref and out keywords mean "I may/will replace the current reference with a new reference". After calling the method, the variable might refer to a completely different object. On the other hand, it's always known that for reference types, the properties of that type might change and if they were important you should cache them.

I don't fully agree with it, but it definitely makes sense to have something to signify "the reference will change".

OwenP
+1  A: 

You're not alone at being confused and coding in c++ and c# I can understand why (not ment as critisism but as a comment since that's the world I live in too) in c++ when you use & on an argument to pass a reference you are basically saying this is an alias I will use inside the method for the argument being passed to the method. Anything you do to that argument will have the same effect as if you had used the variable it self. so in code you can do: void Foo(MyClass& arg) { arg = new MyClass(1); }

int x = new MyClass(0);
Foo(x);

or

int x = new MyClass(0);
void Foo()
{
   x = new MyClass(1);
}

in either case x now equals MyClass(1) (and you have a leak cuz there's no way you can get to the original but that's not my point). I guess from you question you knew that already but it will serve a purpose anyway :)

If you pass a reference in the standard is that the reference is passed by value. It's no longer an alias everything you do to the object being references will effect the object but if you do anything to the variable referencing the object (e.g. assigning a new object) then that will only affect the copy of the reference. Let's have some more code

c#

MyClass x = MyClass(0);
void Foo(MyClass arg) //reference being passed by val
{
  arg = new MyClass(1);
}
Foo(x);

x still equals MyClass(0)

MyClass x = MyClass(0);
void Foo(ref MyClass arg) //passing by ref
{
  arg = new MyClass(1);
}
Foo(ref x);

x equals MyClass(1)

So the standard argument passing in C# differs from passing a reference in C++ but using the ref keyword (which is not encouraged) gives you close to the same mechanics as & in c++ & in C++ is usually encouraged due to optimization/lack of copying how ever since you only copy the reference in C# that's not a concern and ref should only be used when you really need an alias for the variable being passed to the method aka when you potentially have to assign a new object instance to the variable rather than using/changing the state of an object

Rune FS
+1  A: 

If you've coded in C++ before then you must be familiar with pointers. An object reference in .NET code and Java is a pointer under the hood. It just isn't explicitly written in a syntax that makes it obvious that it is a pointer, you are supposed to memorize it. The rule isn't very hard, any variable that refers to an object of a class, an array, a string or System.Object is a reference type and the variable is a pointer. Anything else is a value type and the variable contains the actual value.

When you pass such a variable to a method, you are passing the pointer value. The method can then modify the pointed-to object as it sees fit. Passing the pointer by reference doesn't make any difference, it is still the same pointer, pointing to the same object.

This is entirely different when you pass a value of a value type. If you want the calling method to modify that value then you have to generate a pointer to that value. You do so by using the "ref" keyword in the method declaration.

The outlier case is where you want the calling method to return a new object. In other words, modify the pointer value. Then you have to use the ref keyword, that creates a pointer to a pointer. You'd typically avoid that by having the method return the object as the method return value.

Hans Passant
Thank you. Does the .Net framework treat the value types which fit into 64-bits, and the ones which do not differently? If so, then what exactly is that difference?
Hamish Grubijan
Only the JIT compiler cares. It depends on the processor type whether or not it stuffs a 64-bit value in a CPU register.
Hans Passant
+1  A: 

Lots of people are confused by this. Here's how I think of it. I don't think of "ref" as meaning "by reference" at all. I think of "ref" as meaning "alias". That is, when you say

void M(ref int x) { x = 123; }
void N(int z) { ... }
...
int y = 456;
M(ref y);
N(y);

what that "ref y" means is "please make the corresponding formal parameter x an alias to the variable y". That is, x and y now are both variables for the same storage location. When you write to x, you're writing to y because x is another name for y.

When you pass without ref, as in N(y) you're saying "y and z are two different variables such that z begins its lifetime with the same contents as y".

Once you start thinking about it like that you don't have to worry about pass-by-ref vs pass-by-value, blah blah blah, it's all very confusing. The key difference is that normal passing creates a new variable and initializes it with the argument, whereas ref makes an alias to an existing variable.

I wish we'd used "alias" instead of "ref"; it would have been much more clear.

Eric Lippert
I prefer ref. It's more compatible with my existing knowledge (i.e. ref means you're passing your reference *by reference* instead of *by value*).
Brian