views:

76

answers:

1

In WEKA-a data mining software for the MICROARRAY DATA, how can i remove the redundant tuples from the existing data set? The code to remove the redundancy should be in JAVA.

i.e, the data set contains data such as

H,A,X,1,3,1,1,1,1,1,0,0,0

D,R,O,1,3,1,1,2,1,1,0,0,0

H,A,X,1,3,1,1,1,1,1,0,0,0

C,S,O,1,3,1,1,2,1,1,0,0,0

H,A,X,1,3,1,1,1,1,1,0,0,0

here the tuples 1,4,5 are redundant.

The code should return the following REDUNDANCY REMOVED data set...

H,A,X,1,3,1,1,1,1,1,0,0,0

D,R,O,1,3,1,1,2,1,1,0,0,0

C,S,O,1,3,1,1,2,1,1,0,0,0

+1  A: 

You could use one of the classes that implements the Set such as java.util.HashSet.

You can load your data set into the Set and then extract them either by converting to an array via the Set.toArray() method or by iterating over the set.

Set<Tuple> tupleSet = new HashSet<Tuple>();

      
for (Tuple tuple: tupleList) {    
    tupleSet.add(tuple);    
}  

// now all of your tuples are unique  
for (Tuple tuple: tupleSet) {    
    System.out.println("tuple: " + tuple);  
}  
rayd09
in addition to your suggestion you must also implement the equals and hashCode methods of Tuple. otherwise the redundancy calculation will be based on the Tuple object reference only
LiorH