views:

1056

answers:

4

Ok so here is my issue. I have to HashSet's, I use the Removeall method to delete values that exist in one set from the other.

Prior to calling the method, I obviously add the values to the Sets. I call .toUpperCase() on each String before adding because the values are of different cases in both lists. There is no rhyme or reason to the case.

Once I call RemoveAll, I need to have the original cases back for the values that are left in the Set. Is there an efficient way of doing this without running through the original list and using CompareToIgnoreCase?

Example:

List1:

"BOB"
"Joe"
"john"
"MARK"
"dave"
"Bill"

List2:

"JOE"
"MARK"
"DAVE"

After this, create a separate HashSet for each List using toUpperCase() on Strings. Then call RemoveAll.

Set1.removeAll(set2);

Set1:
    "BOB"
    "JOHN"
    "BILL"

I need to get the list to look like this again:

"BOB"
"john"
"Bill"

Any ideas would be much appreciated. I know it is poor, there should be a standard for the original list but that is not for me to decide. Thank you!

+1  A: 

You can use a hashmap and use the capital set as keys that map to the mixed case set.

Keys of hashmaps are unique and you can get a set of them using HashMap.keyset();

to retrieve the original case, it's as simple as HashMap.get("UPPERCASENAME").

And according to the documentation:

Returns a set view of the keys contained in this map. The set is backed by the map, so changes to the map are reflected in the set, and vice-versa. The set supports element removal, which removes the corresponding mapping from this map, via the Iterator.remove, Set.remove, removeAll, retainAll, and clear operations. It does not support the add or addAll operations.

So HashMap.keyset().removeAll will effect the hashmap :)

EDIT: use McDowell's solution. I overlooked the fact that you didn't actually need the letters to be upper case :P

Charles Ma
A: 

as far as i know, hashset's use the object's hashCode-method to distinct them from each other. you should therefore override this method in your object in order to distinct cases.

if you're really using string, you cannot override this method as you cannot extend the String-class.

therefore you need to create your own class containing a string as attribute which you fill with your content. you might want to have a getValue() and setValue(String) method in order to modify the string.

then you can add your own class to the hashmap.

this should solve your problem.

regards

Atmocreations
+7  A: 

In my original answer, I unthinkingly suggested using a Comparator, but this causes the TreeSet to voilate the equals contract and is a bug waiting to happen:

// Don't do this:
Set<String> setA = new TreeSet<String>(String.CASE_INSENSITIVE_ORDER);
setA.add("hello");
setA.add("Hello");
System.out.println(setA);

Set<String> setB = new HashSet<String>();
setB.add("HELLO");
// Bad code; violates symmetry requirement
System.out.println(setB.equals(setA) == setA.equals(setB));


It is better to use a dedicated type:

public final class CaselessString {
  private final String string;
  private final String normalized;

  private CaselessString(String string, Locale locale) {
    this.string = string;
    normalized = string.toUpperCase(locale);
  }

  @Override public String toString() { return string; }

  @Override public int hashCode() { return normalized.hashCode(); }

  @Override public boolean equals(Object obj) {
    if (obj instanceof CaselessString) {
      return ((CaselessString) obj).normalized.equals(normalized);
    }
    return false;
  }

  public static CaselessString as(String s, Locale locale) {
    return new CaselessString(s, locale);
  }

  public static CaselessString as(String s) {
    return as(s, Locale.ENGLISH);
  }

  // TODO: probably best to implement CharSequence for convenience
}

This code is less likely to cause bugs:

Set<CaselessString> set1 = new HashSet<CaselessString>();
set1.add(CaselessString.as("Hello"));
set1.add(CaselessString.as("HELLO"));

Set<CaselessString> set2 = new HashSet<CaselessString>();
set2.add(CaselessString.as("hello"));

System.out.println("1: " + set1);
System.out.println("2: " + set2);
System.out.println("equals: " + set1.equals(set2));

This is, unfortunately, more verbose.

McDowell
No need to roll your own Comparator. The String class provides one for you: http://java.sun.com/javase/6/docs/api/java/lang/String.html#CASE_INSENSITIVE_ORDER
banjollity
@bankollity. Thanks! - that's been there since Java 1.2 and I've never noticed it. Code amended.
McDowell
Wow that was extremely simple to implement although the documentation leads you to believe that the comparator is used solely for sorting. TreeSet(Comparator c) : Constructs a new, empty set, sorted according to the specified comparator. http://java.sun.com/j2se/1.4.2/docs/api/java/util/TreeSet.html#TreeSet%28java.util.Comparator%29 . I am glad it worked though, Thank you so much for your response!
I should perhaps note that String.CASE_INSENSITIVE_ORDER should be OK for English-language identifiers, but isn't something you should rely on for non-English comparison.
McDowell
I'm not sure what I was thinking when I wrote this answer - this violates the contract of `Set`. I'll have to come back with a better solution.
McDowell
+1 for providing another answer which doesn't only do what needs to be done, but is also correct (as in obeys requirements).
extraneon
A: 

This would be an interesting one to solve using google-collections. You could have a constant Predicate like so:

private static final Function<String, String> TO_UPPER = new Function<String, String>() {
    public String apply(String input) {
       return input.toUpperCase();
}

and then what you're after could be done someting like this:

Collection<String> toRemove = Collections2.transform(list2, TO_UPPER);

Set<String> kept = Sets.filter(list1, new Predicate<String>() {
    public boolean apply(String input) {
        return !toRemove.contains(input.toUpperCase());
    }
}

That is:

  • Build an upper-case-only version of the 'to discard' list
  • Apply a filter to the original list, retaining only those items whose uppercased value is not in the upper-case-only list.

Note that the output of Collections2.transform isn't an efficient Set implementation, so if you're dealing with a lot of data and the cost of probing that list will hurt you, you can instead use

Set<String> toRemove = Sets.newHashSet(Collections2.transform(list2, TO_UPPER));

which will restore an efficient lookup, returning the filtering to O(n) instead of O(n^2).

Cowan