tags:

views:

950

answers:

6

C and C++ allows passing of structure and objects by value to function, although prevents passing arrays by values, why?

+10  A: 

You can pass an array by value, but you have to first wrap it in a struct or class. Or simply use a type like std::vector.

I think the decision was for the sake of efficiency. One wouldn't want to do this most of the time. It's the same reasoning as why there are no unsigned doubles. There is no associated CPU instruction, so you have to make what's not efficient very hard to do in a language like C++.

As @litb mentioned: "C++1x and boost both have wrapped native arrays into structs providing std::array and boost::array which i would always prefer because it allows passing and returning of arrays within structs"

An array is a pointer to the memory that holds that array and the size. Note it is not the exact same as a pointer to the first element of the array.

Most people think that you have to pass an array as a pointer and specify the size as a separate parameter, but this is not needed. You can pass a reference to the actual array itself while maintaining it's sizeof() status.

//Here you need the size because you have reduced 
// your array to an int* pointing to the first element.
void test1(int *x, int size)
{
  assert(sizeof(x) == 4);
}

//This function can take in an array of size 10
void test2(int (&x)[10])
{
  assert(sizeof(x) == 40);
}

//Same as test2 but by pointer
void test3(int (*x)[10])
{
  assert(sizeof(*x) == 40);
  //Note to access elements you need to do: (*x)[i]
}

Some people may say that the size of an array is not known. This is not true.

int x[10];  
assert(sizeof(x) == 40);

But what about allocations on the heap? Allocations on the heap do not return an array. They return a pointer to the first element of an array. So new is not type safe. If you do indeed have an array variable, then you will know the size of what it holds.

Brian R. Bondy
You see that [10] in the function parameter? That's where the size is compimg from. The reference has nothing to do with it.
anon
it's more type safe than the test1
Brian R. Bondy
@Neil Butterworth: The size of an array is part of the array type itself. If you wanted variable size you'd be better off with an std::vector.
Brian R. Bondy
You said that "people think you need to specify the size". People are right, as your needlessly complex code demonstrates.
anon
It has everything to do with it. Had you said void test2(int x[10]); you would pass nothing more than a simple pointer, and the "10" there is completely ignored and useless (even dangerous). Now the reference accepts the parameter by-reference and avoids conversion of the argument to pointer.
Johannes Schaub - litb
@Neil Butterworth: Thanks for pointing out what needed clarification. I corrected my description to say "specify the size as a separate parameter". My above code demonstrates that there is a difference between a pointer to the first element and a pointer to an array.
Brian R. Bondy
But you still need to know the size! My point is that using sizeof() in the function is pointless because you already know the size of tehe array is 10.
anon
I think the above nicely demonstrates what the difference between a pointer and an array pointer is. The second variant does not pass the size, it is just specifying the type. Which happens to contain the size. It is important to know because someone might use code as in test1 with sizeof.
Brian R. Bondy
Brian yeah i like that sample too. look at bottom of http://stackoverflow.com/questions/275994/whats-the-best-way-to-do-a-backwards-loop-in-c-c-c/276053#276053 too for a sizeof replacement for arrays.
Johannes Schaub - litb
worthwhile to mention C++1x and boost both have wrapped native arrays into structs providing std::array<T, N> and boost::array<T, N> which i would always prefer because it allows passing and returning of sarray within structs
Johannes Schaub - litb
@litb: nice array_size implementation, very cool
Brian R. Bondy
+1, nice clear explanation.
j_random_hacker
@Neil: It is in fact possible to capture the size of the array without specifying it in the function declaration by turning the function into a function template, with the array size as a non-type (i.e. integral) template parameter.
j_random_hacker
thanks for the praise, Brian. appreciated :)
Johannes Schaub - litb
+6  A: 

EDIT: I've left the original answer below, but I believe most of the value is now in the comments. I've made it community wiki, so if anyone involved in the subsequent conversation wants to edit the answer to reflect that information, feel free.

Original answer

For one thing, how would it know how much stack to allocate? That's fixed for structures and objects (I believe) but with an array it would depend on how big the array is, which isn't known until execution time. (Even if each caller knew at compile-time, there could be different callers with different array sizes.) You could force a particular array size in the parameter declaration, but that seems a bit strange.

Beyond that, as Brian says there's the matter of efficiency.

What would you want to achieve through all of this? Is it a matter of wanting to make sure that the contents of the original array aren't changed?

Jon Skeet
char x[10]; assert(sizeof(x) == 10);
Brian R. Bondy
Brian is right. arrays in C++ and C89 always have sizes known at compile time :) but i think your answer still has to do with it. there are many situations where an array decays into a pointer. and making arrays pass to functions would for one dangerous (precisely because of that decay) ...
Johannes Schaub - litb
and useless to a certain degree, because we would be limited for passing one size (i.e int[42] - but not int[41]).
Johannes Schaub - litb
i believe your answer points out exactly that, just from a point of view which expects to make it work to accept different array sizes but which would not work in C++/C because it would need to accept array sizes that have different sizes. so i +1 anyway :D
Johannes Schaub - litb
The size of the array is part of the type itself. If you wanted variable size you'd be better off with an std::vector.
Brian R. Bondy
@litb: arrays certainly do *not* always have know sizes at compile time! An array allocated on the heap, for instance, can have an arbitrary size that varies depending on the state and inputs of the program.
thehouse
i meant when the argument has an array type, then that size has to be known. passing an array on the heap around by value wouldn't be possible because one would need the size at compile time. now i see what jon means, thanks thehouse :)
Johannes Schaub - litb
@thehouse: new loses some type information, i.e. it is not type safe for arrays. It will cast away the array part of the pointer. But if you indeed have a typesafe variable that holds an array, it will always know its size.
Brian R. Bondy
...The type of an array includes its size. But it can be reduced to a simple pointer to the first element, hence losing the size. My point is that arrays and pointers are distinct beasts, but often treated together.
Brian R. Bondy
Is this answer still useful? I'm not convinced it is, but I'm loathe to delete it when there's potentially useful information in the comments. Suggestions anyone?
Jon Skeet
@Jon: I would leave it for the comments. About your last comment: you are right, if the user does not want the contents to be changed, the language (C++, not C) has that facility through the use of the const keyword.
David Rodríguez - dribeas
Okay, I'll leave it but put a note in the answer to point to the comments :)
Jon Skeet
+14  A: 

In C/C++, internally, an array is passed as a pointer to some location, and basically, it is passed by value. The thing is, that copied value represents a memory address to the same location.

In C++, a vector<T> is copied and passed to another function, by the way.

Mehrdad Afshari
Best answer! I think though it should emphasis also that C/C++ does not *prevent* passing the array by value.
hasen j
This answer confuses arrays with pointers. It may sound simple, but it indeed causes confusion i think. It's important to keep the concepts of arrays and pointers apart, to understand their fundamental difference in the first place.
Johannes Schaub - litb
@litb: Well, yeah, it's pretty important to know that it's not precisely the same as passing a pointer from the compiler's point of view (e.g. multidimensional arrays), but it's equally important to emphasis that the actual array address is *copied*, not passed by reference.
Mehrdad Afshari
An array is indeed more than a pointer, otherwise for char a[1]; sizeof(a) would equal sizeof(char*) instead of being 1.
anon
@Neil: I think that's exactly the thing `litb` pointed out. The real problem is the concept you might want to define as an array. While I admit litb's comment is perfectly valid, I'm not able to explain it in a more precise way. I think it's already obvious what I mean.
Mehrdad Afshari
Yes, i think that is the fundamental problem. Both of us are right i think. Some talk about the concept of an array, and thus include buffers allocated at the heap and pointed to by a pointer, and some talk about declared arrays as they exist in C and C++ as aggregate objects.
Johannes Schaub - litb
@litb: I think the problem is that new always returns a char* even for arrays. It would be better if it returned an int (*)[10] when you allocated that, but then you wouldn't be allowed to have dynamic allocations. I think you could hack something together with templates to get typesafe new
Brian R. Bondy
Brian, yeah i agree, that special handling somewhat knuddles pointers and arrays together (btw, here is teh pet peeve http://stackoverflow.com/questions/423823/whats-your-favorite-programmer-ignorance-pet-peeve/484900#484900 :p)
Johannes Schaub - litb
nice :), I don't think everyone understands that new doesn't return an array it returns a pointer to the first element of an array. The question on this page asks about arrays though, not pointers to the first element of an array.
Brian R. Bondy
@Brian: All goes back to the definition of the array. Is it the buffer or the concept as litb pointed out? Isn't the pointer of the first element called an array? or is it? Is a vector an array (as it stores a sequence of elements linearly, which is one definition of an array)?
Mehrdad Afshari
@Mehrdad: I think the definition of an "array type" is a variable that has a pointer to the first element and a known size. Otherwise what you have is just a pointer.
Brian R. Bondy
A buffer is what an array pointers to, but I don't think of it as part of the array type.
Brian R. Bondy
It's not part of an array *type*, but it can probably be considered a part of an array. You are certainly right about array type, and the way compiler treats arrays. Anyhow, I think it does not matter that much but the real controversial thing is the definition.
Mehrdad Afshari
Ya I guess basically if you consider array == array type.
Brian R. Bondy
Mehrdad, it's quite clear what an array is. a pointer is not an array. a vector is not, a boost::array is not... just a T[N] is. the other things either emulate an array, or model the conceptual array. But they are not real arrays.
Johannes Schaub - litb
if you say an array is nothing more than a pointer, then that's plain wrong. a lambda expression then is nothing more than a chunk of bits. everything is the same then, if you abstract the language rules away and look at them at the assembler/machine code level
Johannes Schaub - litb
@litb: Right. That statement should be clarified I think... This one is better.
Mehrdad Afshari
@litb: Your last comment is terrific. It made me thinking of an answer like "an array is a sequence of electrons that get around the machine :))"
Mehrdad Afshari
@Mehrdad: I know what you're trying to say, but I think your current explanation is very close to wrong. The problem is that it blurs the line -- *all* cases where pointer semantics are used instead of value semantics can be explained as "well, it's really value semantics on the underlying address".
j_random_hacker
Yes, but isn't it the right thing nevertheless?! I mean, considering the question is a design decision on the way C works, and not a how-to question, I think it's OK to give the answer "this is basically the only way *C* works, everything is call by value, but that thing is now an address"
Mehrdad Afshari
Mehrdad, hehe great i made you think of something fun :p dunno, i thought as you have it now it makes sense - the pointer is passed by value. after all it is, i think.
Johannes Schaub - litb
C has two things: the "locator value" (= lvalue) and the "value of an expression" (= rvalue), as it terms those in a note. While there are rvalue arrays (arrays wrapped in a struct and returned from a function), the "value of an array in an expression" is intended to be the pointer i believe...
Johannes Schaub - litb
@Mehrdad: Actually your answer is correct, it's just a question of emphasis. My reading of your answer (and maybe I'm alone :) ) is that you're trying to say "C *does* use value semantics everywhere" (using a broadened definition of "value semantics") which is confusing I think. :/
j_random_hacker
+3  A: 

I'm not actually aware of any languages that support passing naked arrays by value. To do so would not be particularly useful and would quickly chomp up the call stack.

Edit: To downvoters - if you know better, please let us all know.

anon
C++ vector<...> objects, when passed by value, are copied. And although this wastes (heap-)memory, it won't "quickly chomp up the call stack".
vog
The native array inside a vector is passed by reference - it's a pointer. C++ native arrays are exactly the same as C arrays.
anon
MATLAB has value semantics even for arrays. This is one reason why non-trivial matlab programs often use a huge amount of memory, though current versions of the interpreter use copy-on-write to reduce the memory usage.
janneb
Copy on write is passing by ref (R does the same). The fact the array may grow later (depending on use) is not the issue.
anon
i agree with Neil. i know no language which can pass arrays by value. C# and Java pass them not at all (not even by reference), C++ can pass them by reference only. And C can't pass them at all either, only passing a pointer to their first element.
Johannes Schaub - litb
though, is passing arrays contained in structs passing-of-arrays? probably one could argue about that hours long :D
Johannes Schaub - litb
By native, I meant naked - I'll change my answer.
anon
No, as I said, MATLAB has value semantics, not reference semantics. When you call a function, semantically you get copies of the arguments rather than references to them. That the interpreter can defer copying is an implementation detail; perhaps I shouldn't have brought that up, confusing the issue
janneb
i don't know any language that does this, but certainly there are languages that do so. not saying you are wrong. just saying i don't know one that do so. i imagine it could have benefits where language do long calcs with the arguments - aliasing could matter there. thanks for the insignt janneb
Johannes Schaub - litb
now that you told me matlab has value semantics, indeed i will have to correct myself and say i know at least one language that does :)
Johannes Schaub - litb
You can pass arrays in Ada as "in" arguments. This is conceptually like passing them by value. However you can't modify "in" parameters. And certainly, under the sheets, the compiler is passing them by reference for efficiency.
Brian Neal
@vog: Passing by value consumes stack. Passing vectors by value does not in so much as the memory is indeed allocated on the heap. The vector internals are copied, the copy constructor will allocate new heap memory and copy the arrays. No reference to the original is passed.
David Rodríguez - dribeas
'copy the arrays' should be: copy the array contents.@janneb: Copy on write is quite troublesome in multithreaded programs. Either you enforce locking on each access to the element affecting performance, or you end up playing russian rulette with pointers.
David Rodríguez - dribeas
@dribeas: That might be; I'm not arguing that the MATLAB way is better or worse, just saying how it is.
janneb
@Brian Neal: In Fortran you can set the intent(in) attribute for arguments, which is similar to Ada. But as you say yourself, this is not really value semantics. More like passing objects via const reference in C++.
janneb
+3  A: 

This is one of those "just because" answers. C++ inherited it from C, and had to follow it to keep compatibility. It was done that way in C for efficiency. You would rarely want to make a copy of a large array (remember, think PDP-11 here) on the stack to pass it to a function.

Brian Neal
+4  A: 

I think that there 3 main reasons why arrays are passed as pointers in C instead of by value. The first 2 are mentioned in other answers:

  • efficiency
  • because there's no size information for arrays in general (if you include dynamically allocated arrays)

However, I think a third reason is due to:

  • the evolution of C from earlier languages like B and BCPL, where arrays were actually implemented as a pointer to the array data

Dennis Ritchie talks about the early evolution of C from languages like BCPL and B and in particular how arrays are implemented and how they were influenced by BCPL and B arrays and how and why they are different (while remaining very similar in expressions because array names decay into pointers in expressions).

Michael Burr