views:

458

answers:

6

This question is about a data structure I thought of. It is a dynamic array, like std::vector<> in C++, except the removal algorithm is different.

In a normal dynamic array, when an element is removed, all the remaining elements must be shifted down, which is O(n), unless it's the last element, which will be O(1).

In this one, if any element is removed, it is replaced by the last element. This of course loses ordering of the elements. But now removal of any element is constant time.

A list will have the same removal times, but this structure has random access. The only caveat with that is you don't know what you're accessing, since ordering could be jumbled, so what use is random access anyway. Plus a list won't mess up any pointers/iterators to elements.

So meh, this structure seems rather useless except for the very specific task of strictly walking through elements and perhaps removing them along the way. A list can do the same, but this has better cache performance.

So, does this strange/useless structure have a name, and does it have any uses? Or just a nice little brain storm?

+4  A: 

This idea is used in Knuth (Fisher–Yates) shuffle. An element picked at random is replaced with the last one in the array. Since what we want is a random permutation anyway, the reordering doesn't matter.

Rafał Dowgird
The reordering *does* matter; that's where the uniform randomness of the final permutation comes from!
ShreevatsaR
@ShreevatsaR: I mean the reordering of the elements not picked yet - they will be reordered anyway, so changes to their order introduced by picking other elements do not matter. Of course, this requires (a simple) proof that this reordering doesn't really affect the uniformness of the final distribution.
Rafał Dowgird
Thanks, I chose this since it's closest to what I thought of, plus it makes me feel like I was almost as smart as Knuth.
GMan
A: 

Hm, does that algorithm really have O(1) removal time?

That would mean that

  1. Finding the element to remove is O(1)
  2. Finding the last element (which will replace the deleted element) is O(1)
  3. Finding the second-to-last element (the new "last" element) is O(1)

...which is not possible in any data structure I can come up with. Although a double-linked list could fullfill these constraints, given that you've already got a pointer to the element to remove.

Christoffer
What? Removal times have nothing to do with search times. So yes, this has O(1) removal time.
GMan
Store the size of the array SIZE and a pointer PTR in a struct representing the array. (1) is PTR+n, where n is the element to remove, which is O(1). (2) is PTR+SIZE which is O(1). (3) is PTR+(SIZE-1), probably realized by SIZE-- but still, which is O(1).
Kevin Montrose
a standard array? i believe GMan meant removal by index, not value.
Autoplectic
Ah, of course. I assumed removal by value, that's what has been relevant at work lately.
Christoffer
+2  A: 

I remember using this method plenty of times before. But I don't know a name for it.

Simple example: In a computer game you are iterating all the "bad guys" and calculating their movements etc. One thing that can happen to them is to disappear (their dead body finished fading away and is 99% transparent now). At that point you remove it from the list like you do, and resume iterator without increasing the iteration counter.

Something similar to this is done in a Binary Heap when deleting an item, however there the next step is to maintain the heap rule - O(log n).

yairchu
+3  A: 

So, does this strange/useless structure have a name, and does it have any uses?

I've used something similar in simulations of multi-process systems.

In a scheduler for processes implemented as state machines, each process is either waiting for an external event, active or completed. The scheduler has an array of pointers to the processes.

Initially each process is active, and the scheduler has the index of the last waiting and first completed process, initially zero and the length of the array.

V-- waiting
[ A-active, B-active, C-active, D-active ]
                             completed --^
^- run

To step the process to its next state, the scheduler iterates over the array and runs each process in turn. If a process reports that it is waiting, it's swapped with the process after the last waiting process in the array.

           V-- waiting
[ A-waiting, B-active, C-active, D-active ]
                              completed --^
             ^- run

If it reports that it has completed, it's swapped with the process before the first completed array.

           V-- waiting
[ A-waiting, D-active, C-active, B-completed ]
                   completed --^
             ^- run

So as the scheduler runs and processes transition from active to waiting or completed, the array become ordered with all the waiting processes at the start, all the active ones in the middle, and the completed ones at the end.

                      V-- waiting
[ A-waiting, C-waiting, D-active, B-completed ]
                   completed --^
                        ^- run

After either a certain number of iterations, or when there are no more active processes, the completed processes are cleaned out of the array and external events are processed:

                      V-- waiting
[ A-waiting, C-waiting, D-completed, B-completed ]
          completed --^
                        ^- run == completed so stop

This is similar in that it's using swapping to remove items from a collection, but it is removing items from both ends rather and leaving the 'collection' in the middle.

Pete Kirkham
+1  A: 

It's called a Set.

Dave Gamble
Commonly, set element removal is O(log(n)).
Stefan Monov
A: 

I dont know of a name for it but it is better than a list in certain cases.

In particular, this would be vastly superior to a singly or doubly linked list for very small data. Because you store everything contiguously there's no extra pointer overhead per element.

Michael Anderson