views:

329

answers:

7

I have a task for which it is necessary to generate a unique value for every object in a set. using the hashcode would be perfect, if collisions weren't allowed in the hashcode contract.

One idea: Record every object's hashcode into a multiset. Then, use hashcodes as the unique identifier, but if that hashcode is in the set more than once, use a different value that is also not in the set. But this feels bulky and awkward.

Better ideas?

Here's what I have already:

public static <V> void toGraphViz(final Graph<V, DefaultWeightedEdge> g, String filename) {

    // to avoid hashcode collisions
    final Set<Integer> hashcodes = new HashSet<Integer>(g.vertexSet().size());

    DOTExporter<V, DefaultWeightedEdge> dot = new DOTExporter<V, DefaultWeightedEdge>(new VertexNameProvider<V> () {

    // vertex name must be unqiue
    @Override
    public String getVertexName(V arg0) {
        int hash = arg0.hashCode();
        while (hashcodes.contains((hash))) {
            hash += 1;
        }
        return "" + hash;
    }
}

EDIT: I guess this wasn't originally clear, but the id number does somehow need to be a function of the object, because getVertexName(V) will get called several times, and it expects that for the same values of V, it will get the same results.

Also, the Vertex type is generic. So I can't make any modifications to a specific class to fix this.

+2  A: 

Why not just use a serial number?

static private int serial=0;
static public synchronized nextSerialNumber() { return ++serial; }

Or a combination/hybrid, say a long of ((hash<<32) | getNextSerial()).

To address the EDIT clarification

When you construct the object, allocate the serial number to a private member variable and return it for hashCode(). You should then override equals with a call to super.equals() (since a generated serial number is consistent with the default equals() implementation) because seeing a hashCode() override without a corresponding equals() override will red-flag the code to tools (and other programmers).

public class Vertex
{
private final int                   serial;                                 // instance serial number

public Vertex() {
    serial=nextSerialNumber();
    ...
    }

public int hashCode() {
    return serial;
    }

public boolean equals(Object obj) {
    return super.equals(obj);                                               // serial number hash-code consistent with default equals    
    }

...        

static private int nextSerial=0;
static public synchronized nextSerialNumber() { return nextSerial++; }
}
Software Monkey
that would work, but I don't have a specific Vertex class - it's a generic.
Rosarch
+4  A: 

What is the lifetime of this unique number? Just the lifetime of the program? In which case why not just a simple static counter in the class, accessed with suitable synchronisation? Increment it for each new object. No need to keep a list of the values you have used, just the highest value you have used.

If unique across many executions (and maybe many simultaneous instances) then perhaps you can just use a Database which generates unqiue record ids.

EDITED in response to clarification

The piece I missed before was that we can't modify the class for which we want to generate the unique "hash".

I think that working from the hash code of the class, which will have collisions is making life hard. Assuming that we can rely upon the Vertex classes in question having correctly implemented equals() then we can use the object itself as a key to the set of hashcodes we have used.

public class Hasher {

    public  <V> void toGraphViz(final Graph<V, DefaultWeightedEdge> g, String filename) {
         final Map<V, Integer> hashcodes = new HashMap< V, Integer>();
         final int latestHashHolder[] = { 0 }; // array to allow access from inner class

         DOTExporter<V, DefaultWeightedEdge> dot 
                 = new DOTExporter<V, DefaultWeightedEdge>(new VertexNameProvider<V> ()) {

         // vertex name must be unqiue
         @Override
         public synchronized String getVertexName(V vertex) {
          int hashcode;
          if ( hashcodes.containsKey(vertex)){
           hashcode = hashcodes.get(vertex);
          } else {       
           hashcode = latestHashHolder[0];
           latestHashHolder[0]++;
           hashcodes.put(vertex, (Integer)latestHashHolder[0]);
          }
          return "Vertex-" + hashcode;
         }
        };
    }
}
djna
+2  A: 

You could consider using a UUID, depending on what you are trying to accomplish...

Steven Schlansker
+2  A: 

To find a unique value for an object, you have to know a combination of properties that make the object unique.

To run ".contains()", you need to have a method of determining ".equals()", which means you should already know how to uniquely identify a Vertex, so maybe you can come up with an expression of the unique properties?

e.g., "(x, y, z, rgb)"

Unless I'm misunderstanding the question, I wouldn't recommend mucking with an object's hashCode for this purpose.

BranTheMan
A: 

I probably don't understand what you are doing, but consider creating a reference to each object. Since the reference contains the address of the object it will be unique for each object.

marichyasana
A: 

It's not that difficult, is it? Just use a different hash algorithm, if the one in Java doesn't guarantee no collisions. Send the object to the hash algorithm, e.g. Sha-256, and use that as the key. If you need to keep different copies of exact same object, with different hash values, use a seed when you perform the hash, and store this related to the object with the hash.

Erik A. Brandstadmoen
+1  A: 

I think you misunderstood hashcode. Based on the contract the hascode should be the same when equals(..) is true and vice versa. So in your case only a vertex with the same properties should have the same hascode, otherwise your self written hascode calculation method should be fixed. As far as I have understood your question a vertex for itself is unique, so you shouldn't have a problem, right?

Patrick Cornelissen