views:

148

answers:

7

What's the most natural way to model a group of objects that form a set? For example, you might have a bunch of user objects who are all subscribers to a mailing list.

Obviously you could model this as an array, but then you have to order the elements and whoever is using your interface might be confused as to why you're encoding arbitrary ordering data.

You can use a hash where the members are keys that map to "1" or "true", but in most languages there are restrictions on what data types a hash key can be.

What's the standard way to do this in modern languages (PHP, Perl, Ruby, Python, etc)?

+1  A: 

In Python, you would use the set datatype. A set supports containing any hashable object, so if you have a custom class you need to store in a set and the default hashable behaviour is not appropriate, you can implement __hash__ to implement the behaviour you want.

Greg Hewgill
+1  A: 

C# has the HashSet<T> generic collection.

public class EmailAddress  // probably needs to override GetHashCode()
{
   ...
}

var addresses = new HashSet<EmailAddress>();
tvanfosson
+1  A: 

Most modern languages are going to have some form of Set data structure. Java has HashSet, which implements the Set interface.

In PHP you can use an array to store your data. Either search the array before you add a new element, or use array_unique to remove duplicates after inserting all elements.

Bill the Lizard
A: 

In c as a stand-in for understanding the machine directly:

  • For small, discrete and well defined ranges: use a bitwise array to indicate the presence of each possible item (set for present, unset for absent).
  • Use a hash-table for all other cases.

Write functions to implement adding and removing items, testing for presence or absence, testing for sub-sets, etc as needed.


As the other answers note, however, if you just want the functionality, use a language feature or third-party library that is already well debugged.

dmckee
A: 

A lot of the time hash-based sets are the correct thing to use, but if you don't need to do key-based lookups and don't worry about enforcing unique values, a vector or list is fine. There is overhead to a hash table, after all.

You seem to be concerned that people will think that the order in the vector is important, but I think that it is a common enough usage that, with documentation, you shouldn't confuse people.

It really depends on how you want to access and use the data.

David Norman
A: 

and Array is usually the simplest way to store data, without any other requirements. Usually other data types are used for different reasons (you want to append data, you want to search data in constant time, you need quick set union/intersection, etc) If your only concern is the abstraction you could wrap it in some kind of unordered facade.

Jimmy
A: 

In Perl I would use a hash, definitely. In other languages I would lament the lack of a hash.

skiphoppy
which other languages would that be?
Jimmy
Any language that hasn't got hashes. :) Plenty of them do, including anything I might use nowadays, like Java.
skiphoppy