tags:

views:

151

answers:

2

I looked a bit at dynamic arrays in D2, and I found them very difficult to understand. It also seems I'm interpreting the spec wrongly.. Working on a reference or slice of a dynamic array seems very error prone when changing the arrays... Or am I just not understanding the fundamentals?

Referring to the same array only shares the actual items:

auto a = [1];
auto b = a;
assert(&a != &b); // different instance; Doesn't share length
assert(a.ptr == b.ptr); // same items
assert(a == [1]);
assert(a == b);

As they reference the same array, changing one changes the other:

auto a = [1,2];
auto b = a;
a[1] = 20;
assert(a == [1,20]);
assert(a == b);

From the spec on array

To maximize efficiency, the runtime always tries to resize the array in place to avoid extra copying. It will always do a copy if the new size is larger and the array was not allocated via the new operator or a previous resize operation.

So changing the length doesn't neccesarily break the reference:

auto a = [1];
auto b = a;
b.length = 2;
assert(b == [1,0]);
assert(a == [1]); // a unchanged even if it refers to the same instance
assert(a.ptr == b.ptr);  // but still the same instance

// So updates to one works on the other
a[0]  = 10;
assert(a == [10]);
assert(b == [10,0]);

From the spec on array

Concatenation always creates a copy of its operands, even if one of the operands is a 0 length array

auto a = [1];
auto b = a;
b ~= 2; // Should make a copy, right..?
assert(a == [1]);
assert(b == [1,2]);
assert(a != b);
assert(a4.ptr == b.ptr); // But it's still the same instance
a[0] = 10;
assert(b == [10,2]); // So changes to a changes b

But when the arrays would step on each other, the values are copied to a new location and the reference broken:

auto a = [1];
auto b = a;
b ~= 2;
assert(a == [1]);
assert(b == [1,2]);

a.length = 2; // Copies values to new memory location to not overwrite b's changes
assert(a.ptr != b.ptr);

Changing length of both arrays before making a change gives the same result as above (I would expect this given the above):

auto a = [1];
auto b = a;
a.length = 2;
b.length = 2;
a[1] = 2;
assert(a == [1,2]);
assert(b == [1,0]);
assert(a.ptr != b.ptr);

And the same when changing length or cancatenating (I would expect this given the above):

auto a = [1];
auto b = a;
b.length = 2;
a ~= 2;
assert(a == [1,2]);
assert(b == [1,0]);
assert(a.ptr != b.ptr);

But then slices also come into the picture, and suddenly it's even more complicated! The slices might be orphaned...

auto a = [1,2,3];
auto b = a;
auto slice = a[1..$]; // [2,3];
slice[0] = 20;
assert(a == [1,20,3]);
assert(a == b);

a.length = 4;
assert(a == [1,20,3,0]);
slice[0] = 200;
assert(b == [1,200,3]); // the reference to b is still valid.
assert(a == [1, 20, 3, 0]); // but the reference to a is now invalid..

b ~= 4;
// Now both references is invalid and the slice is orphan...
// What does the slice modify?
assert(a.ptr != b.ptr);
slice[0] = 2000;
assert(slice == [2000,3]);
assert(a == [1,20,3,0]); 
assert(b == [1,200,3,4]);

So... Is it bad practice to have multiple references to the same dynamic array? And passing slices around etc.? Or am I just way out here, missing the entire point of dynamic arrays in D?

+8  A: 

On the whole, you seem to understand things fairly well, but you appear to be misunderstanding the purpose of the ptr property. It does not indicate whether two arrays refer to the same instance. What it does is get you at the pointer to what is effectively the C array underneath. An array in D has its length as part of it, so it's more like it's a struct with a length and pointer to a C array than it is like a C array. ptr allows you to get at the C array and pass it to C or C++ code. You probably shouldn't be using it for anything in pure D code. If you want to test whether two array variables refer to the same instance, then you use the is operator (or !is to check that they're different instances):

assert(a is b);   //checks that they're the same instance
assert(a !is b);  //checks that they're *not* the same instance

All that ptr being equal for two arrays would indicate is that their first element is in the same place in memory. In particular, their lengths could differ. However, it does mean that any overlapping elements will get altered in both arrays if you alter them in one of them.

When changing the length of an array, D tries to avoid reallocating, but it could decide to reallocate, so you can't necessarily rely on whether it would reallocate or not. For instance, it's going to reallocate if not doing so will stomp on another array's memory (including those that have the same value for ptr). It could also reallocate if there isn't enough memory to resize itself in place. Basically, it will reallocate if not doing so will stomp on another array's memory, and it may or may not reallocate otherwise. So, it's generally not a good idea to rely on whether an array will reallocate or not when you set its length.

I would have expected concatenation to always copy per the docs, but per your tests, it does appear to act just like length does (I don't know whether that means that the docs need to be updated or whether it's a bug - my guess would be that the docs need to be updated). In either case, you certainly can't rely on other references to that array to still refer to the same array after concatenation.

As for slices, they work just as expected and are highly used in D - especially in the standard library, Phobos. A slice is a range for an array and ranges are a core concept in Phobos. However, just like many other ranges, altering the container that the range/slice is for could invalidate that range/slice. That's why when you're using functions which could resize containers in Phobos, you need to use the functions prepended with stable (e.g. stableRemove() or stableInsert()) if you don't want to risk invalidating the ranges that you have to that container.

Also, a slice is an array just like the array that it points to. So, naturally, altering its length or concatenating to it is going to follow all of the same rules as those for altering the length of or concatenating to any other array, and it could therefore be reallocated and no longer be a slice into another array.

Pretty much, you just need to be aware that altering the length of an array in any way could result in a reallocation, so you need to avoid doing that if you want references to continue to refer to the same array instance. And if you absolutely need to make sure that they do not point to the same reference, then you need to use dup to get a new copy of the array. If you don't mess with the length of an array at all, then array references (be they slices or references to the whole array) will continue to happily refer to the same array.

EDIT: It turns out that the docs need to be updated. Anything that could resize the array will try to do it in place if it can (so it might not reallocate) but will reallocate if it has to in order to avoid stomping on the memory of another array or if it doesn't have enough space to reallocate in place. So, there shouldn't be any distinction between resizing the array by setting its length property and resizing it by concatenating to it.

Jonathan M Davis
simendsjo
+1  A: 

Hey.

I didn't really want to make this into a full-blown answer, but I can't yet comment on the previous answer.

I think that concatenation and appending are two slightly different operations. If you use ~ with an array and an element, it's appending; with two arrays, it's concatenation.

You could try this instead:

a = a ~ 2;

And see if you get the same results.

Also, if you want to have defined behaviour, just use the .dup (or .idup for immutables) properties. This is also very useful if you have an array of references; you can modify the main array and .dup slices to work on out without worrying about race conditions.

EDIT: ok, I got it a bit wrong, but there it is anyway. Concatenation != appending.

//Max

awishformore
Yes, someone else just pointed that out for me, but I cannot find this in the spec... It just says that "a op= b" is the same as "a = a op b". Cannot find the array distinction in the spec..
simendsjo
I think I remember it from TDPL by Andrei (the book). At least I'm sure I've explicitly read this somewhere.
awishformore
Actually, `op` and `op=` are overloaded separately, so if the docs say otherwise, they're wrong. You overload `op` with `opBinary` and `op=` with `opAssign`. So, they could theoretically do totally separate things (though they shouldn't). I believe that it's done that way to allow for optimizations since `op=` can do stuff in place rather than having to create a new value with the result. So, if arrays don't act exactly the same with `~` as they do with `~=`, it's not particularly special.
Jonathan M Davis
Then the spec is wrong: http://digitalmars.com/d/2.0/expression.html#AssignExpression
simendsjo
Well the specs/docs are out-of-date for a lot of stuff, but there is a discussion going on in the newsgroup about updating it more thoroughly/regularly. Hopefully, this will be fixed soon and contributions to the language/standard library will include a change in the docs from now on.
awishformore