views:

260

answers:

6

Let's say I want to put words in a data structure and I want to have constant time lookups to see if the word is in this data structure. All I want to do is to see if the word exists. Would I use a HashMap (containsKey()) for this? HashMaps use key->value pairings, but in my case I don't have a value. Of course I could use null for the value, but even null takes space. It seems like there ought to be a better data structure for this application.

The collection could potentially be used by multiple threads, but since the objects contained by the collection would not change, I do not think I have a synchronization/concurrency requirement.

Can anyone help me out?

+20  A: 

Use HashSet instead. It's a hash implementation of Set, which is used primarily for exactly what you describe (an unordered set of items).

Daniel Lew
If I were to use something like hashSet, does that mean I can't use strings?hashSet.add("key");hashSet.contains("key"); //false, right? Since the two keys are separate objects?
jbu
nope, it returns true. Hashset is perfect
jbu
You should be able to use Strings just fine, because String has its own implementation of hashCode() which returns the same hash for strings that are equal. Reference: http://java.sun.com/j2se/1.5.0/docs/api/java/lang/String.html#hashCode()
Daniel Lew
You could also look into the intern() function if you want the same String representation
Ed Marty
Unnecessary here, as HashSets are based solely on hashCode and equals, and String will always behave as expected for those.
Matthew Flaschen
Note: the HashSet uses a HashMap and a dummy static Object for the value associated to each entry (key of the HashMap).
Carlos Heuberger
+6  A: 

You want to use a Collection implementing the Set interface, probably HashSet to get the performance you stated. See http://java.sun.com/javase/6/docs/api/java/util/Set.html

Brad B
If I were to use something like hashSet, does that mean I can't use strings?hashSet.add("key");hashSet.contains("key"); //false, right? Since the two keys are separate objects?
jbu
nope, it returns true. Hashset is perfect
jbu
jbu, it's going to use first hashCode() and then equals() to check for membership (the latter to check for collisions). For objects from the standard library, like String, you're in good shape. If you define your own objects, make sure hashCode() and equals() are correctly defined.
Brad B
+1  A: 

Other than Sets, in some circumstances you might want to convert a Map into a Set with Collections.newSetFromMap(Map<E,Boolean>) (some Maps disallow null values, hence the Boolean).

Tom Hawtin - tackline
+6  A: 

You probably want to use a java.util.Set. Implementations include java.util.HashSet, which is the Set equivalent of HashMap.

Even if the objects contained in the collection do not change, you may need to do synchronization. Do new objects need to be added to the Set after the Set is passed to a different thread? If so, you can use Collections.synchronizedSet() to make the Set thread-safe.

If you have a Map with values, and you have some code that just wants to treat the Map as a Set, you can use Map.entrySet() (though keep in mind that entrySet returns a Set view of the keys in the Map; if the Map is mutable, the Map can be changed through the set returned by entrySet).

NamshubWriter
+5  A: 

You'd generally use an implementation of Set, and most usually HashSet. If you did need concurrent access, then ConcurrentHashSet provides a drop-in replacement that provides safe, concurrent access, including safe iteration over the set.

I'd recommend in any case referring to it as simply a Set throughout your code, except in the one place where you construct it; that way, it's easier to drop in one implementation for the other if you later require it.

Even if the set is read-only, if it's used by a thread other than the one that creates it, you do need to think about safe publication (that is, making sure that any other thread sees the set in a consistent state: remember any memory writes, even in constructors, aren't guaranteed to be made available to other threads when or in the otder you expect, unless you take steps to ensure this). This can be done by both of the following:

  • making sure the only reference(s) to the set are in final fields;
  • making sure that it really is true that no thread modifies the set.

You can help to ensure the latter by using the Collections.unmodifiableSet() wrapper. This gives you an unmodifiable view of the given set-- so provided no other "normal" reference to the set escapes, you're safe.

Neil Coffey
A: 

Hello, as everyone said HashSet is probably the simplest solution but you won't have constant time lookup in a HashSet (because entries may be chained) and you will store a dummy object (always the same) for every entry...

For information here a list of data structures maybe you'll find one that better fits your needs.

pgras