views:

1182

answers:

12

Is there a more efficient way to grow a large array than creating a new one and copying over all the values? Like, concatenating it with a new one?

What about having fixed-size arrays stored in another array and reallocate / copy that top-level one? Would that leave the actual values in place?

I'm aware of ArrayList, but I need a lot of control about accessing the array and the access needs to be very fast. For instance, I think I prefer a[i] to al.get(i).

The main reason why I care about this is that the array in question (or a number of such arrays) might very well occupy a large enough portion of main memory that the usual strategy of creating a double sized copy before discarding the original might not work out. This may mean that I need to reconsider the overall strategy (or up my hardware recommendations).

I'm not too concerned about time efficiency (the operation will be rare), but rather about memory efficiency: Can I grow the array without temporarily having all the values twice?

+2  A: 

Have a look at System.arraycopy.

kgiannakakis
This is not what the OP is asking ... A copy of the array is still made when growing.
bruno conde
That might not have been clear in the first revision of my post, I made some clarifications later.
Hanno Fietz
+1  A: 

AFAIK the only way of growing or reducing an array is doing a System.arraycopy

   /**
    * Removes the element at the specified position in this list.
    * Shifts any subsequent elements to the left (subtracts one from their
    * indices).
    *
    * @param index the index of the element to removed.
    * @return the element that was removed from the list.
    * @throws    IndexOutOfBoundsException if index out of range <tt>(index
    *     &lt; 0 || index &gt;= length)</tt>.
    */
    public static <T> T[] removeArrayIndex(T[] src, int index) {
        Object[] tmp = src.clone();

        int size = tmp.length;
        if ((index < 0) && (index >= size)) {
            throw new ArrayIndexOutOfBoundsException(index);
        }

        int numMoved = size - index - 1;
        if (numMoved > 0) {
            System.arraycopy(tmp, index + 1, tmp, index, numMoved);
        }
        tmp[--size] = null; // Let gc do its work

        return (T[]) Arrays.copyOf(tmp, size - 1);
    }

   /**
    * Inserts the element at the specified position in this list.
    * Shifts any subsequent elements to the rigth (adds one to their indices).
    *
    * @param index the index of the element to inserted.
    * @return the element that is inserted in the list.
    * @throws    IndexOutOfBoundsException if index out of range <tt>(index
    *     &lt; 0 || index &gt;= length)</tt>.
    */
    public static <T> T[] insertArrayIndex(T[] src, Object newData, int index) {
        Object[] tmp = null;
        if (src == null) {
            tmp = new Object[index+1];
        } else {
            tmp = new Object[src.length+1];

            int size = tmp.length;
            if ((index < 0) && (index >= size)) {
                throw new ArrayIndexOutOfBoundsException(index);
            }

            System.arraycopy(src, 0, tmp, 0, index);
            System.arraycopy(src, index, tmp, index+1, src.length-index);
        }

        tmp[index] = newData;

        return (T[]) Arrays.copyOf(tmp, tmp.length);
    }
Carlos Tasada
I should look this up before I make myself look silly (but time is an issue ATM), but don't you need a throws clause in the method declaration for the ArrayIndexOutOfBoundsException, or is that unchecked?
Gazzonyx
I would need to double check, but this code is copy)
Carlos Tasada
ArrayIndexOutOfBoundsException is definitely unchecked.
hjhill
+15  A: 

The best way to have a dynamically resizing 'array' or list of items is to use an ArrayList.

Java has already built in very efficient resizing algorithms into that data structure.

But, if you must resize your own array, it is best to use System.arraycopy() or Arrays.copyOf().

Arrays.copyOf() can most simply be used like so:

int[] oldArr;
int newArr = Arrays.copyOf(oldArr, oldArr.length * 2);

This will give you a new array with the same elements as the old array, but now with room to spare.

The Arrays class in general has lots of great methods for dealing with arrays.

Also

It is important to make sure that you aren't just growing your array by one element each time an element is added. It is best to implement some strategy where you only have to resize the array every once in a while. Resizing arrays is a costly operation.

jjnguy
Well, the growing strategy would most likely be the classic "double each time".
Hanno Fietz
I amended my post to clarify that I'm not concerned about the time it takes to grow the array, but rather about the extra memory that's required by the operation. If this has to be 100% of the array size, I can't grow arrays but have to add arrays.
Hanno Fietz
+1 for a very good answer to the more sloppy first revision of my question!
Hanno Fietz
I'll keep this one here though, it still has some good information.
jjnguy
+1  A: 

Obviously, the important bit here is not if you concatenate the arrays or copy them over; what's more important is your array growing strategy. It's not hard to see that a very good way to grow an array is always doubling its size when it becomes full. This way, you will turn the cost of adding an element to O(1) as the actual growing stage will happen only relatively rarely.

DrJokepu
Yes, that's one important bit, but I was aware of that one. My question arises because I have a really large array that may occupy a large enough portion of memory that I can not just create a temporary copy.
Hanno Fietz
+6  A: 

Arrays are constant-size, so there is no way to grow them. You can only copy them, using System.arrayCopy to be efficient.


ArrayList does exactly what you need. It's optimized much better than any of us could do, unless you devote a considerable time to it. It uses internally System.arrayCopy.


Even more, if you have some huge phases where you need the list to grow/reduce, and others where it doesn't grow/reduce and you make thousands of read or write in it. Suppose also you have a huge performance need, that you prooved that ArrayList is too slow when read/writing. You could still use the ArrayList for one huge phase, and convert it to an array for the other. Note this would be effective only if your application phases are huge.

KLE
A: 

Heres a benchmark of time taken to add and remove elements from a collection/arraylist/vector

Narayan
That's Java 1.2 Those figures are probably very out of date.
Gazzonyx
+12  A: 

Is there a more efficient way to grow a large array than creating a new one and copying over all the values? Like, concatenating it with a new one?

No. And I'm pretty sure there's no language that offers such a feature without faking it by doing copying or indirection "under the hood". Once you allocate the space for the array and do something else, you most likely have other objects in memory right after the end of the array. At that point, it's fundamentally impossible to grow the array without copying it.

What about having fixed-size arrays stored in another array and reallocate / copy that top-level one? Would that leave the actual values in place?

You mean have an array of arrays and treat it as one large array consisting of a concatenation of the underlying arrays? Yes, that would work (the "faking it by doing indirection" approach), as in Java, Object[][] is simply an array of pointers to Object[] instances.

Michael Borgwardt
Thanks for commenting on my indirection idea, that's exactly what I was wondering.
Hanno Fietz
+1  A: 

Is the array itself large, or are you referencing large ReferenceTypes?

There is a difference between an array of a PrimitiveType with billions of elements, and an array with thousands of elements, but they refer to large class instances.

int[] largeArrayWithSmallElements = new int[1000000000000];
myClass[] smallArrayWithLargeElements = new myClass[10000];

Edit:

If you have performance considerations using ArrayList, I can assure you it will perform more or less exactly as Array indexing.

And if the application has limited memory resources, you can try to play around with the initial size of the ArrayList (one of it's constructors).

For optimal memory efficiency, you could create a container class with an ArrayList of Arrays.

Something like:

class DynamicList
{
    public long BufferSize;
    public long CurrentIndex;

    ArrayList al = new ArrayList();

    public DynamicList(long bufferSize)
    {
     BufferSize = bufferSize;

     al.add(new long[BufferSize]);
    }

    public void add(long val)
    {
     long[] array;

     int arrayIndex = (int)(CurrentIndex / BufferSize);

     if (arrayIndex > al.size() - 1)
     {
      array = new long[BufferSize];
      al.add(array);
     }
     else
     {
      array = (long[])al.get(arrayIndex);
     }

     array[CurrentIndex % BufferSize] = val;
    }

    public void removeLast()
    {
     CurrentIndex--;
    }

    public long get(long index)
    {
     long[] array;

     int arrayIndex = (int)(index / BufferSize);

     if (arrayIndex < al.size())
     {
      array = (long[])al.get(arrayIndex);
     }
     else
     {
      // throw Exception
     }

     return array[index % BufferSize];
    }
}

(my java is rusty, so please bear with me...)

Yannick M.
The array itself is large (millions), the element type is `long`. Also, there are several thousand such arrays.
Hanno Fietz
The difference being that copying the small array with large elements only takes up the space for the object references, not the objects?
Hanno Fietz
Exactly, hence my question, but since you have an array that takes a lot of memory, I would suggest a different approach when you have memory efficiency in mind.
Yannick M.
+3  A: 
NomeN
Wouldn't that mean you have a permanent copy of all the data in memory? With the memory limitations he talks about, don't think this will help.
Yannick M.
For my case, Yannick is right, but it's still an interesting idea, might be an inspiration for similar problems.
Hanno Fietz
Well if were to be used only as a backup to rebuild a fixed size array, you might aswell use an ArrayList :-)
Yannick M.
@Yannick If you'd use an arraylist, don't you have a full copy in memory again when you increase the size of the ArrayList.
NomeN
@Yannick (again, 1st comment now) I don't believe I'd have a copy of all data in memory, if I'm not parsing your comment correctly or miss some fundamentals here please set me straight.
NomeN
@ NomeNYou would have a full copy in memory since it increases the size with an ArrayCopy, however having a persistant copy in memory leaves me to believe that memory might not be such a problem after all
Yannick M.
what do you mean by 'it' in "since it increases the size with an ArrayCopy". If you mean the linked list, I meant a linked list you'd write yourself.
NomeN
I mean having a linked list of all the data + an array of the same data would mean you have all the data in memory, twice. At this point memory usage stops being a consideration, and you might aswell just use an ArrayList instead of the linked list.
Yannick M.
Aha, but thats the trick, the array does not contain the actual objects. It just keeps a reference to the object which resides in the linked list thus making the array lightweight.
NomeN
I think you are missing the fact that he has a very large array of primitives. Having a linkedlist of object references to primitives would give you for N elements in the array: N * 8 bytes for the primitives + N * 4 bytes for the object reference in the linked list + N * 4 bytes for the object reference of the next item in the chain + N * 8 bytes for the fixed size array of primitives. Long story short, you would use 3 times the memory of the fixed size array alone.
Yannick M.
I have not seen any mention of primitives, but in that case you would be right. Although the array would only be N*4 as it are references, but that's just nitpicking at that point ;-). Thx for your patience with me though!
NomeN
I am unsure how primitives are handled in java. In .NET referring to ValueTypes would require boxing. Not a good idea in this case I believe.
Yannick M.
It seems java does it in a similar way: http://java.sun.com/j2se/1.5.0/docs/guide/language/autoboxing.html
Yannick M.
+1  A: 

One way of doing this is having a linked list of array nodes. This is somewhat complex but the premise is this:

You have a linked list and each node within the list references an array. This way, your array can grow without ever copying. To grow you only need to add additional nodes at the end. Therefore the 'expensive' grow operation only occurs every M operations where M is the size of each node. Granted, this assumes that you always append to the end and you don't remove.

Insertion and removal in this structure is quite complicated, but if you can avoid them then that's perfect.

The only loss with this structure (ignoring insertion and deletion) is with the gets. The gets will be slightly longer; accessing the correct node requires accessing the correct node within the linked list and then fetching there. If there are a lot of accesses around the middle, this can be slow however there are tricks to speeding linked lists up.

Malaxeur
A: 

Have you looked at GNU Trove for highly efficient java collections? Their collections store primatives directly for much better memory usage.

Robert
+1  A: 

"Can I grow the array without temporarily having all the values twice?"

Even if you copy the arrays, you're only going to have all the values once. Unless you call clone() on your values, they're passed by reference into the new array.

If you already have your values in memory, the only additional memory expense when copying into a new array is allocating the new Object[] array, which doesn't take much memory at all, as it's just a list of pointers to value objects.

Sam Barnum
It's a large array of primitives. No object references. Hence copying the array does need at least twice the size of the array allocated.
Yannick M.