views:

419

answers:

9

I am trying to make a remove method that works on an array implementation of a list. Can I set the the duplicate element to null to remove it? Assuming that the list is in order.

ArrayList a = new ArrayList[];

public void removeduplicates(){

    for(a[i].equals(a[i+1]){

        a[i+1] = null;

    }
    a[i+1] = a[i];
}
+3  A: 

No you can't remove an element from an array, as in making it shorter. Java arrays are fixed-size. You need to use an ArrayList for that.

If you set an element to null, the array will still have the same size, but with a null reference at that point.

 // Let's say a = [0,1,2,3,4] (Integer[])
 a[2] = null;
 // Now a = [0,1,null,3,4]
Martinho Fernandes
+1  A: 

Yes, you can set elements in an array to null, but code like a[i].equals(a[i+1]) will fail with a NullPointerException if the array contains nulls, so you just have to be more careful if you know that your array may contain nulls. It also doesn't change the size of the array so you will be wasting memory if you remove large numbers of elements. Fixed size arrays are generally not a good way to store data if you are often adding and removing elements - as you can guess from their name.

Mark Byers
A: 

No, an array element containing a null is still there, it just doesn't contain any useful value.

You could try moving every element from further down in the list up by 1 element to fill the gap, then you have a gap at the end of the array - the array will not shrink from doing this!

If you're doing this a lot, you can use System.arraycopy() to do this packing operation quickly.

Carl Smotricz
A: 

Use ArrayList.remove(int index).

  if(a[i].equals(foo()))
     a.remove(i)

But be careful when using for-loops and removing objects in arrays.

http://java.sun.com/j2se/1.3/docs/api/java/util/ArrayList.html

Vargen
Definitely not the answer: `a` is an array. You can't call `ArrayList.remove` on arrays...
Martinho Fernandes
+1  A: 

Can I set the the duplicate element to null to remove it?

You can set an element of the array null but this doesn't remove the element of the array... it just set the element to null (I feel like repeating the first sentence).

You should return a cleaned copy of the array instead. One way to do this would be to use an intermediary java.util.Set:

String[] data = {"A", "C", "B", "D", "A", "B", "E", "D", "B", "C"};

// Convert to a list to create a Set object
List<String> list = Arrays.asList(data);
Set<String> set = new HashSet<String>(list);

// Create an array to convert the Set back to array.
String[] result = new String[set.size()];
set.toArray(result);

Or maybe just use a java.util.Set :)

Pascal Thivent
+2  A: 

The straight-forward answer to your question is that setting an array or ArrayList element to null gives you a null entry in the array or ArrayList. This is not the same thing as removing the element. If just means that a[i] or a.get(i) will return null rather than the original element.

The code in the question is garbled. If you are going to use an ArrayList, the simplisitic solution would be something like this:

ArrayList a = new ArrayList();

public void removeduplicates() {
    for (int i = 0; i < a.size() - 1; ) {
        if (a.get(i).equals(a.get(i + 1)) {
            a.remove(i);
        } else {
            i++;
        }
    }
}

but in the worst case, that is O(N**2) because each call to remove copies all elements at indexes greater than the current value of i.

If you want to improve the performance, do something like this:

public ArrayList removeduplicates() {
    ArrayList res = new ArrayList(a.size());
    if (a.size() == 0) {
        return res;
    }
    res.add(a.get(0));
    for (int i = 1; i < a.size(); i++) {
        if (!a.get(i - 1).equals(a.get(i)) {
            res.add(a.get(i));
        }
    }
    return res;
}

(This is a quick hack. I'm sure it could be tidied up.)

Stephen C
which is why a linked list is better for this particular case.
Jherico
@Jherico ... it depends on what else you are doing with the list, and on the likelihood that you will be doing multiple removals. `O(N**2)` is WORST CASE behavior. Besides, I just added a version that is `O(N)` in the worst case.
Stephen C
@Stephen, but uses `O(N)` extra storage.
Martinho Fernandes
@Martinho - My point is that there *is no such thing* as a best solution unless you know the entire algorithmic context AND you can characterize the input data. For instance, in this case the most performant solution may actually be to replace elements with nulls!!
Stephen C
+2  A: 

Is this a homework question?

Your problem is analogous to the stream processing program uniq: Preserve -- by way of copying -- any element that doesn't match the one before it. It only removes all duplicates if the sequence is sorted. Otherwise, it only removes contiguous duplicates. That means you need to buffer at most one element (even if by reference) to use as a comparison predicate when deciding whether to keep an element occurring later in the sequence.

The only special case is the first element. As it should never match any preceding element, you can try to initialize your buffered "previous" element to some value that's out of the domain of the sequence type, or you can special-case your iteration with a "first element" flag or explicitly copy the first element outside the iteration -- minding the case where the sequence is empty, too.

Note that I did not propose you provide this operation as a destructive in-place algorithm. That would only be appropriate with a structure like a linked list with constant-time overhead for removing an element. As others note here, removing an element from an array or vector involves shuffling down successor elements to "fill the hole", which is of time complexity n in the number of successors.

seh
yes it is a homework question. Thanks that was helpful.
Ben Fossen
+1  A: 

Your code example was quite confusing. With ArrayList[] you showed an array of ArrayList objects.

Assuming that you're talking about just the java.util.ArrayList, then the most easy way to get rid of duplicates is to use a java.util.Set instead, as mentioned by others. If you really want to have, startwith, or end up with a List for some reasons then do:

List withDuplicates = new ArrayList() {{ add("foo"); add("bar"); add("waa"); add("foo"); add("bar"); }}; // Would rather have used Arrays#asList() here, but OK.
List withoutDuplicates = new ArrayList(new LinkedHashSet(withDuplicates));
System.out.println(withoutDuplicates); // [foo, bar, waa]

The LinkedHashSet is chosen here because it maintains the ordering. If you don't worry about the ordering, a HashSet is faster. But if you actually want to have it sorted, a TreeSet may be more of value.

On the other hand, if you're talking about a real array and you want to filter duplicates out of this without help of the (great) Collections framework, then you'd need to create another array and add items one by one to it while you check if the array doesn't already contain the to-be-added item. Here's a basic example (without help of Arrays.sort() and Arrays.binarySearch() which would have eased the task more, but then you would end up with a sorted array):

String[] array1 = new String[] {"foo", "bar", "foo", "waa", "bar"};
String[] array2 = new String[0];

loop:for (String array1item : array1) {
    for (String array2item : array2) {
        if (array1item.equals(array2item)) {
            continue loop;
        }
    }
    int length = array2.length;
    String[] temp = new String[length + 1];
    System.arraycopy(array2, 0, temp, 0, length);
    array2 = temp;
    array2[length] = array1item;
}

System.out.println(Arrays.toString(array2)); // [foo, bar, waa]

Hope this gives new insights.

BalusC
+1  A: 

If you are implementing your own list and you have decide to use a basic primitives storage mechanism. So using an array (rather than an arraylist) could be where you start.

For a simple implementation, your strategy should consider the following.

Decide how to expand your list. You could instantiate data blocks of 200 cells at a time. You would only use 199 because you might want to use the last cell to store the next allocation block.

Such linked list are horrible so you might decide to use a master block to store all the instances of blocks. You instantiate a master block of size 100. You start with one data block of 200 and store its ref in master[0]. As the list grows in size, you progressively store the ref of each new data blocks in master[1] .... master[99] and then you might have to recreate the master list to store 200 references.

For the reason of efficiency, when you delete a cell, you should not actually exterminate it immediately. You let it hang around until enough deletion has occurred for you to recreate the block.

You need to somehow flag a cell has been deleted. So the answer is obvious, of course you can set it to null because you are the king, the emperor, the dictator who decides how a cell is flagged as deleted. Using a null is a great and usual way. If you use null, then you have to disallow nulls from being inserted as data into your list class. You would have to throw an exception when such an attempt is made.

You have to design and write a garbage collection routine and strategy to compact the list by recreating blocks to remove nullified cells en mass. The JVM would not know those are "deleted" data.

You need a register to count the number of deletions and if that threshold is crossed, garbage collection would kick in. Or you have the programmer decide to invoke a compact() method. Because if deletions are sparse and distributed across various data blocks, might as well leave the null/deleted cells hang around. You could only merge adjacent blocks and only if the sum of holes in both blocks count up to 200, obviously.

Perhaps, when data is appended to a list, you deliberately leave null holes in between the data. It's like driving down the street and you see house addresses incremented by ten because the the city has decided that if people wish to build new houses in between existing houses. In that way you don't have to recreate and split a block every time an insertion occurs.

Therefore, the answer is obvious to yourself, of course you can write null to signify a cell is deleted, because it is your strategy in managing the list class.

Blessed Geek