The code is hard to read and is not very efficient. The "dest" parameter is confusing: It's passed as a parameter, then it's cleared and the results are added to it. What's the point of it being a parameter? Why not simply return a new collection? The only benefit I can see is that the caller can determine the collection type. Is that necessary?
I think this code can be more clearly and probably more efficiently written as follows:
public static Set<String> createSet(Collection<String> source) {
Set<String> destination = new HashSet<String>(source) {
private static final long serialVersionUID = 1L;
public boolean add(String o) {
if ("".equals(o)) {
return false;
}
return super.add(o);
}
};
return destination;
}
Another way is to create your own set type:
public class NonEmptyStringSet extends HashSet<String> {
private static final long serialVersionUID = 1L;
public NonEmptyStringSet() {
super();
}
public NonEmptyStringSet(Collection<String> source) {
super(source);
}
public boolean add(String o) {
if ("".equals(o)) {
return false;
}
return super.add(o);
}
}
Usage:
createSet(source);
new NonEmptyStringSet(source);
Returning the set is more performant because you don't first have to create a temporary set and then add all to the dest collection.
The benefit of the NonEmptyStringSet type is that you can keep adding strings and still have the empty string check.
EDIT1:
Removing the "if(src.containsAll(dest)) return;" code introduces a "bug" when calling the method with source == dest; The result is that source will be empty. Example:
Collection<String> source = new ArrayList<String>();
source.add("abc");
copyStringCollectionAndRemoveDuplicates(source, source);
System.out.println(source);
EDIT2:
I did a small benchmark which shows that my implementation is about 30% faster then a simplified version of your initial implementation. This benchmark is an optimal case for your initial implementation because the dest colletion is empty, so it doesn't have to clear it. Also take not that my implementation uses HashSet instead of LinkedHashSet which makes my implementation a bit faster.
Benchmark code:
public class SimpleBenchmark {
public static void main(String[] args) {
Collection<String> source = Arrays.asList("abc", "def", "", "def", "",
"jsfldsjdlf", "jlkdsf", "dsfjljka", "sdfa", "abc", "dsljkf", "dsjfl",
"js52fldsjdlf", "jladsf", "dsfjdfgljka", "sdf123a", "adfgbc", "dslj452kf", "dsjfafl",
"js21ldsjdlf", "jlkdsvbxf", "dsfjljk342a", "sdfdsa", "abxc", "dsljkfsf", "dsjflasd4" );
int runCount = 1000000;
long start1 = System.currentTimeMillis();
for (int i = 0; i < runCount; i++) {
copyStringCollectionAndRemoveDuplicates(source, new ArrayList<String>());
}
long time1 = (System.currentTimeMillis() - start1);
System.out.println("Time 1: " + time1);
long start2 = System.currentTimeMillis();
for (int i = 0; i < runCount; i++) {
new NonEmptyStringSet(source);
}
long time2 = (System.currentTimeMillis() - start2);
System.out.println("Time 2: " + time2);
long difference = time1 - time2;
double percentage = (double)time2 / (double) time1;
System.out.println("Difference: " + difference + " percentage: " + percentage);
}
public static class NonEmptyStringSet extends HashSet<String> {
private static final long serialVersionUID = 1L;
public NonEmptyStringSet() {
}
public NonEmptyStringSet(Collection<String> source) {
super(source);
}
@Override
public boolean add(String o) {
if ("".equals(o)) {
return false;
}
return super.add(o);
}
}
public static void copyStringCollectionAndRemoveDuplicates(
Collection<String> src, Collection<String> dest) {
Set<String> uniqueSet = new LinkedHashSet<String>(src.size());
for (String f : src)
if (!"".equals(f))
uniqueSet.add(f);
dest.addAll(uniqueSet);
}
}