views:

129

answers:

5

Generally speaking, can you suggest an approach which would let me test objects to make sure they are alike.

Accept that objects are alike if over 'n%' worth of content of the object is identical.

Other then a brute force, are there any libraries available i can take advantage of?

thanks

A: 

One thing you can try is encoding the objects then comparing the result... In particular I've done this with JSON. For detecting if objects match completely, this is straightforward.

Frank Schwieterman
+1  A: 

This could only be done on a case by case basis. If I really needed this functionality, I'd define an interface:

public interface Similar<Entity> {
boolean isSimilar(Entity other);
}

Each implementing Class can define what it means to be 'similar' to another instance. Things to keep in mind would be same issues that you would keep in mind for cloning: shallow copy vs deep copy, etc.

Naive implementation of Person:

public class Person implements Similar<Person> {
    private String firstName;
    private String lastName;

    public String getLastName() {
        return lastName;
    }

    public String getFirstName() {
        return firstName;
    }

    public boolean isSimilar(Person other) {
        if (other != null) {
            if (lastName.equalsIgnoreCase(other.getLastName())
                || (firstName.equalsIgnoreCase(other.getFirstName()))) {
                return true;
            }
        }

        return false;
    }
}
Paul Croarkin
+3  A: 

As a starting point, have a look at something called the Levenshtein distance and see if it's relevant to your use?

Neil Coffey
+1  A: 

I believe you can find a good solution if you focus on the details of your specific problem. The only "reasonable" solution I have in mind for the general case is based on reflection: scan the data members and find similarities of corresponding pairs of members recursively.

However, there are so many problems with this idea, so I don't think it's feasible. Among them:

1) The concept of weight of member subtrees should be well defined in order to be able to return a similarity percent.

2) How to handle data members that only belong to one of the objects? this will happen frequently when comparing an instance of class A to an instance of a descendant class B.

3) Maybe the biggest problem: The mapping between the internal structure of an object to its abstract data representation is not an injective function. For example, two hashmaps representing the same mapping may have different inner structure, due to different history of table re-allocations.

Eyal Schneider
A: 

You could implement the Comparable interface and define your own 'logic' for comparing instances of a class.

As mentioned before me, for text similarities you could use distance calculation algorithms which you can find in the the SimMetrics library (http://www.dcs.shef.ac.uk/~sam/simmetrics.html).

Another way to compare is by comparing object hashcodes (after you override the hashCode() method of the Object class) - note sure that it's what you are looking for.

andreas