views:

210

answers:

5

I have an array, say:

var arr1 = new [] { 1, 2, 3, 4, 5, 6 };

Now, when my array-size exceeds 5, I want to resize the current array to 3, and create a new array that contains the upper 3 values, so after this action:

arr1 = new [] { 1, 2, 3 };
newArr = new [] { 4, 5, 6 };

What's the fastest way to do this? I guess I'll have to look into the unmanaged corner, but no clue.


Some more info:

  • The arrays have to be able to size up without large performance hits
  • The arrays will only contain Int32's
  • Purpose of the array is to group the numbers in my source array without having to sort the whole list

In short: I want to split the following input array:

int[] arr = new int[] { 1, 3, 4, 29, 31, 33, 35, 36, 37 };

into

arr1 =  1, 3, 4
arr2 =  29, 31, 33, 35, 36, 37

but because the ideal speed is reached with an array size of 3, arr2 should be split into 2 evenly sized arrays.

Note

I know that an array's implementation in memory is quite naive (well, at least it is in C, where you can manipulate the count of items in the array so the array resizes). Also that there is a memory move function somewhere in the Win32 API. So I guess this would be the fastest:

  1. Change arr1 so it only contains 3 items
  2. Create new array arr2 with size 3
  3. Memmove the bytes that aren't in arr1 anymore into arr2
+6  A: 

I'm not sure there's anything better than creating the empty arrays, and then using Array.Copy. I'd at least hope that's optimized internally :)

int[] firstChunk = new int[3];
int[] secondChunk = new int[3];
Array.Copy(arr1, 0, firstChunk, 0, 3);
Array.Copy(arr1, 3, secondChunk, 0, 3);

To be honest, for very small arrays the overhead of the method call may be greater than just explicitly assigning the elements - but I assume that in reality you'll be using slightly bigger ones :)

You might also consider not actually splitting the array, but instead using ArraySegment to have separate "chunks" of the array. Or perhaps use List<T> to start with... it's hard to know without a bit more context.

If speed is really critical, then unmanaged code using pointers may well be the fastest approach - but I would definitely check whether you really need to go there before venturing into unsafe code.

Jon Skeet
wow...never heard of an ArraySegment. That's interesting, I'll have to check that out. >.>
townsean
Presumably this is going to be done to keep an array from getting too large, so an `ArraySegment` is probably not the right tool.
Gabe
Yeah, know about the ArraySegments, yet my new array should have the ability to size up and down; and in that case have to edit the segment every time. The two array's shouldn't know about each other after splitting.
Jan Jongboom
@Jan: Do you need to be able to remove items as well? Giving more information in the question would really help. (Your comment in the question bit is a start, but not very concrete in terms of requirements.)
Jon Skeet
Check my edit :)
Jan Jongboom
Every time someone mentions `ArraySegment` I despair for what could have been... Why must you be so stupid, ArraySegment?
recursive
My experiments with `Array.Copy` have shown it to be rediculously fast. I now use it whenever possible, even in places where it is not at all appropriate, just because it’s so dang fast. :-) (I also love it for the ability to copy within a single array—even overlapping ranges—with no worries.)
Jeffrey L Whitledge
`ArraySegment`, on the other hand, is terrible (just as @recursive says). It is just a leaked implementation detail. Nothing to see there.
Jeffrey L Whitledge
+2  A: 

If your array will always contain 6 items how about:

var newarr1 = new []{oldarr[0], oldarr[1],oldarr[2]};
var newarr2 = new []{oldarr[3], oldarr[4],oldarr[5]};

Reading from memory is fast.

Achilles
+1: This is probably going to be the fastest approach - much faster than using LINQ, at least. Sometimes, the obvious answer is nice :)
Reed Copsey
@Reed thanks, it's not necessarily elegant, but provided the domain of the problem is fixed(size of array not changing) then it makes sense. We'll see if my answer outduels the great John Skeet.
Achilles
Still fact that you are creating two new arrays whilst there already is one.
Jan Jongboom
@Jan Not true, I'm creating references to already referenced memory.
Achilles
@Jan Jongboom: You can't resize arrays. If you have an array of size 6 and want two arrays of size 3, there is no other way than creating two arrays of size 3. "Resizing" an array refers to the procedure of creating a new array, copying all elements and discarding the old array. In performance critical applications it's best to avoid unnecessary array resizing, e.g., by creating an array of the maximum size and keeping a counter of elements actually used.
dtb
Hmm, don't agree there. An array is something in-memory like 'count of objects', 4 * count bytes of data. Don't know about C# here tho, but by changing the 'count of objects' bytes you should be able to really 'resize' the array.
Jan Jongboom
@Jan: Every approach will create 2 new arrays. LINQ, however, will create two iterators THEN two arrays, so the LINQ solutions will be far less efficient....
Reed Copsey
@Jan Jongboom: C#/.NET doesn't allow you to change the size of an array after it has been created. Sorry if this contradicts your opinion. :-)
dtb
No that's why I was pointing into the unsafe corner :-D
Jan Jongboom
When you instantiate an array in C#, it simply sets aside that many addresses as pointers to objects in memory. Even if the object pointed to does not exist, the pointer to the object is still there.
md5sum
Check my edit (and now there are more than 15 chars in this box)
Jan Jongboom
A: 

The obligatory LINQ solution:

if(arr1.Length > 5)
{
   var newArr = arr1.Skip(arr1.Length / 2).ToArray();
   arr1 = arr1.Take(arr1.Length / 2).ToArray();
}

LINQ is faster than you might think; this will basically be limited by the Framework's ability to spin through an IEnumerable (which on an array is pretty darn fast). This should execute in roughly linear time, and can accept any initial size of arr1.

KeithS
Lovely as LINQ is, it's not going to be the *fastest* approach :)
Jon Skeet
It's even damn slow :)
Jan Jongboom
True, but in the case of arrays it'll be a *fast* approach. Array.Copy would be faster, though, as the memory allocation behind the scenes would be in a big chunk; fewer roundtrips.
KeithS
This won't work. The array's `Length` property will always be the size which was given to it upon instantiation, regardless of the number of real objects currently pointed to.
md5sum
Sure it will. The result of ToArray() is a totally new instance of an array; arr1 is simply a reference to an array on the heap, and that reference does not, by itself, keep track of how big an array it originally pointed to. The Array, as a reference object on the stack, does that.
KeithS
@KeithS - That's not what I was referring to. The `if (arr1.Length > 5)` won't work. If you make a `int[] arr1 = new int[6]` and never put anything in it, it's `Length` will still be 6.
md5sum
@md5sum: ... and the responsibility for making sure the array actually got populated is outside the scope of this algorithm. Array.Copy() on an empty int[6] would get you two empty int[3]s. You gonna vote that one down?
KeithS
A: 

Since arrays are not dynamically resized in C#, this means your first array must have a minimum length of 5 or maximum length of 6, depending on your implementation. Then, you're going to have to dynamically create new statically sized arrays of 3 each time you need to split. Only after each split will your array items be in their natural order unless you make each new array a length of 5 or 6 as well and only add to the most recent. This approach means that each new array will have 2-3 extra pointers as well.

Unless you have a known number of items to go into your array BEFORE compiling the application, you're also going to have to have some form of holder for your dynamically created arrays, meaning you're going to have to have an array of arrays (a jagged array). Since your jagged array is also statically sized, you'll need to be able to dynamically recreate and resize it as each new dynamically created array is instantiated.

I'd say copying the items into the new array is the least of your worries here. You're looking at some pretty big performance hits as well as the array size(s) grow.


UPDATE: So, if this were absolutely required of me...

public class MyArrayClass
{
    private int[][] _master = new int[10][];
    private int[] _current = new int[3];
    private int _currentCount, _masterCount;

    public void Add(int number)
    {
        _current[_currentCount] = number;
        _currentCount += 1;
        if (_currentCount == _current.Length)
        {
            Array.Copy(_current,0,_master[_masterCount],0,3);
            _currentCount = 0;
            _current = new int[3];
            _masterCount += 1;
            if (_masterCount == _master.Length)
            {
                int[][] newMaster = new int[_master.Length + 10][];
                Array.Copy(_master, 0, newMaster, 0, _master.Length);
                _master = newMaster;
            }
        }
    }

    public int[][] GetMyArray()
    {
        return _master;
    }

    public int[] GetMinorArray(int index)
    {
        return _master[index];
    }

    public int GetItem(int MasterIndex, int MinorIndex)
    {
        return _master[MasterIndex][MinorIndex];
    }
}

Note: This probably isn't perfect code, it's a horrible way to implement things, and I would NEVER do this in production code.

md5sum
+4  A: 

Are you looking for something like this?

static unsafe void DoIt(int* ptr)
{
    Console.WriteLine(ptr[0]);
    Console.WriteLine(ptr[1]);
    Console.WriteLine(ptr[2]);
}

static unsafe void Main()
{
    var bytes = new byte[1024];
    new Random().NextBytes(bytes);

    fixed (byte* p = bytes)
    {
        for (int i = 0; i < bytes.Length; i += sizeof(int))
        {
            DoIt((int*)(p + i));
        }
    }

    Console.ReadKey();
}

This avoids creating new arrays (which cannot be resized, not even with unsafe code!) entirely and just passes a pointer into the array to some method which reads the first three integers.

dtb