ansaurus

Question

Efficient insertion/deletion algorithm for an array

Answer 1

A:

The simplest would be to run through the updates and copy the array into a new array while applying the updates.

1000 is not that big, it is probably not worthwhile to optimize further.

And to make your life easier better use an ArrayList.

starblue 2010-09-01 19:23:43

I've thought of doing this but may have to have a thousand or so models on the go at once - and while the box will undoubtedly cope with the load, I'd really like to find the "best" way of solving this problem. Thanks for your answer!

Michael_73 2010-09-01 19:25:32

Even a million is not that big these days. If you optimize further make sure to profile it, to see if it is really significantly better.

starblue 2010-09-01 19:29:26

Answer 2

A:

Beyond sorting the individual updates (as you mentioned already) to try and consolidate things, I don't know that I'd bother much. 1000 elements is, frankly, nothing in the large scope of things. I have a system with 25M elements, using simple bulk copies, and it's (for our purposes) far beyond more than fast enough.

So, I won't put the "pre-mature optimization" hat on, but I might glance over at it on the bookshelf first.

Will Hartung 2010-09-01 19:26:44

Yep, know that feeling...

Michael_73 2010-09-01 19:41:22

Answer 3

A:

Using a linked list (java.util.LinkedList) might be something to look into. Getting an element at a particular index is expensive of course, but it may be better than performing array copies.

Michael Angstadt 2010-09-01 19:32:39

Quite right - and I've considered it. It's a case of horses for courses, as you say, and structures like a skip-list may moderate the indexing costs of a linked-list structure.

Michael_73 2010-09-01 20:36:13

Answer 4

+1 A:

Always weigh code clarity vs. optimization. If there is no performance issue now, just make sure that the code is clear. If there is a performance issue in the future, you'll know its exact nature then. Preparing for it now is an exercise in guesswork.

If you need to manipulate quite a bit, a linked list may be worthy.

For simple clear code, however, I would use apache commons collection utils for a raw array or an arraylist otherwise:

myArray = ArrayUtils.add(myArray, insertionIndex, newItem);

OR

ArrayList<> mylist = new ArrayList<>(Arrays.asList(myArray));
myList.add(insertionIndex, newItem);

Peter DeWeese 2010-09-01 19:36:21

Hi Peter, thanks for your answer. Unfortunately a linked list doesn't seem to fit the by-index updates we receive. Mike

Michael_73 2010-09-01 19:39:45

@Michael: ArrayList is not a linked list. It's built on an array and has `O(1)` indexing.

Gunslinger47 2010-09-01 19:46:35

My recommendation is for array or ArrayList manipulation, not a linked list.

Peter DeWeese 2010-09-01 19:47:31

@Peter, but my comment was intended to refer to your suggestion that a linked list may be worthy. @Gunslinger, I intended to reflect that we need random access and as you say, an ArrayList would give us that - having said that, without access to the underlying array we can't optimise by leaving free space at both the start and end, nor can we identify the range undergoing mutation and block-copy the trailing section in once operation. Further we couldn't do batch updates of more than one item.

Michael_73 2010-09-01 20:34:10

Answer 5

A:

There is an extremely simple to implement datastructure, named "Cartesian trees" or "Treaps", that allows O(log N) splits, concatenations, insertions and deletions on arrays (and many more things).

2-3 trees are also very simple to implement (my implementation of a somewhat more complex facility had just 1 bug after the first compilation) and fit your purpose.

jkff 2010-09-01 19:37:58

Do you think that the data-structure overhead will beat an ArrayList for flat-out speed on a typical size of 1,000 elements? Thanks for your answer, jkff, am looking into both structures right now.

Michael_73 2010-09-01 20:29:13

I'm not sure about the speed for 1000 elements, this needs to be benchmarked. Though, I guess the tree will be somewhat faster.

jkff 2010-09-01 20:35:43

Answer 6

+1 A:

If space is not a constraint and you are not going to have duplicates, go for Set datastructure, in particular Java's HashSet. The power of this data structure is the insertion and deletion are done in O(1) time which would best suit you if performance is 'the' criterion.

Moreover, whenever you speak of Arrays besides their fast retrieval, you have the serious limitation of numerous array copies that might happen which not only is going to take up space (for array growth) but also the efficiency will be poor as each of Insert/Delete might take O(n) time.

Bragboy 2010-09-01 19:41:48

Those are insertions and deletions by element, not by index. The usefulness of this answer depends on why Michael is using indexes.

Gunslinger47 2010-09-01 19:52:43

For an indexed list he would need a `HashMap`, not `HashSet`. And `HashMap`s are O(1), but with a significant constant. With only 1000 elements, I'm inclined to think even an ArrayList would be faster.

MAK 2010-09-01 20:08:48

I'm using index values as that is what the data feed contains - there are undoubtedly more efficient ways of describing a change, but that's all I have to work with. What's worse (and I haven't burdened you all with it) is that the list is actually unordered and index values become valid only once all prior updates have been applied to the array.

Michael_73 2010-09-01 20:21:39

Answer 7

+2 A:

In general, if you have the changes listed by index order, you can build a simple loop that only copies things once. Here's some pseudocode:

array items;
array changes; // contains a structure with index, type, an optional data members
array out; // empty, possibly with ensureCapacity(items.length)
int c = 0, delta = 0;
// c is the current change
//delta tracks how indexing has changed by previous operations
for (i = 0; i < items.length; i++) {
    if c < changes.length {
        curchange = changes[c]
        if (i + delta) == curchange.index {
            c++;
            if (curchange.type == INSERT) {
                out.add(curchange.data)
                delta--;
            } else {
                delta++;
                continue; // skip copying i
            }
        }
    }
    out.add(items[i])
}
for (; c < changes.length; c++) { // handle trailing inserts
    assert(c.index == out.length && c.type == INSERT)
    out.add(c.data);
}

That runs through the input array once, and builds the output array with all changes made.

Note that this doesn't handle multiple inserts at the same location. It would make the code a bit more elaborate to do that but it's not too hard.

However, it will always run all the way through the array, once per batch. A slightly tougher change would be to keep a temporary around and do the changes in-place with two index variables; then, if you hit the end of the change list, you could break out of the loop early and not touch the rest of the list.

Walter Mundt 2010-09-01 19:55:50

Looks good Walter, will have a think about it...

Michael_73 2010-09-01 20:25:03

Your answer is pretty much what I was thinking.

Mike Dunlavey 2010-09-03 00:58:27

Answer 8

A:

If that's truly what your data set looks like, you might consider duplicate tracking with a Collection (like a HashMap). The Array would be your ordered list with each activity listed in order, and your Collection would be indices to the array.

For example:

class EventQueue
{
  Vector eventQueue;
  HashMap eventMap;

  public synchronized Event getNextEvent()
  {
      Event event = eventQueue.remove(0);
      eventMap.remove(event.getId());  // this would be 10 from 'INSERT 10' 
                                       // in the sample from the OP
  }

  public synchronized addEvent(Event e)
  {
     if( eventMap.containsKey(e.getId())
     {
        // replace events that already exist
        int idx = eventMap.get(e.getId());
        eventQueue.removeElementAt(idx);
        eventQueue.add(idx, e);
     } else {
        // add new events
        eventQueue.add(e);
        eventMap.add(e.getId(), eventQueue.size()); // may be off by one...
     }
  }

  public boolean isReady()
  {
    return eventQueue.size() > 0;
  }
}

class FeedListener extends Thread {
 EventQueue queue;
 EventFeed feed;
 ...
 public void run()
 {
    while(running) {
       sleep(sleepTime);
       if( feed.isEventReady() ) {
         queue.addEvent(feed.getEvent());
       }
    }
 }
}

abstract class EventHandler extends Thread {
 EventQueue queue;
 ...
 public void run()
 {
   while(running)
   {
     sleep(sleepTime);
     if( queue.isReady() )
     {
       Event event = queue.getNextEvent();
       handleEvent(event);
     }
   }
 }

 public abstract void handleEvent(Event event);
}

atk 2010-09-01 20:01:50

ansaurus

tags:

views:

answers:

Efficient insertion/deletion algorithm for an array

related questions