views:

167

answers:

8

I'm currently working on a class that calculates the difference between two objects. I'm trying to decide what the best design for this class would be. I see two options:

1) Single-use class instance. Takes the objects to diff in the constructor and calculates the diff for that.

public class MyObjDiffer {
  public MyObjDiffer(MyObj o1, MyObj o2) {
    // Calculate diff here and store results in member variables
  }

  public boolean areObjectsDifferent() {
    // ...
  }

  public Vector getOnlyInObj1() {
    // ...
  }

  public Vector getOnlyInObj2() {
    // ...
  }

  // ...
}

2) Re-usable class instance. Constructor takes no arguments. Has a "calculateDiff()" method that takes the objects to diff, and returns the results.

public class MyObjDiffer {
  public MyObjDiffer() { }

  public DiffResults getResults(MyObj o1, MyObj o2) {
    // calculate and return the results.  Nothing is stored in this class's members.
  }
}

public class DiffResults {
  public boolean areObjectsDifferent() {
    // ...
  }

  public Vector getOnlyInObj1() {
    // ...
  }

  public Vector getOnlyInObj2() {
    // ...
  }
}

The diffing will be fairly complex (details don't matter for the question), so there will need to be a number of helper functions. If I take solution 1 then I can store the data in member variables and don't have to pass everything around. It's slightly more compact, as everything is handled within a single class.

However, conceptually, it seems weird that a "Differ" would be specific to a certain set of results. Option 2 splits the results from the logic that actually calculates them.

EDIT: Option 2 also provides the ability to make the "MyObjDiffer" class static. Thanks kitsune, I forgot to mention that.

I'm having trouble seeing any significant pro or con to either option. I figure this kind of thing (a class that just handles some one-shot calculation) has to come up fairly often, and maybe I'm missing something. So, I figured I'd pose the question to the cloud. Are there significant pros or cons to one or the other option here? Is one inherently better? Does it matter?

I am doing this in Java, so there might be some restrictions on the possibilities, but the overall question of design is probably language-agnostic.

A: 

I'd take numero 2 and reflect on whether I should make this static.

kitsune
A: 

It depends on how you're going to use diffs. In my mind, it makes sense to treat diffs as a logical entity because it needs to support some operations like 'getDiffString()', or 'numHunks()', or 'apply()'. I might take your first one and do it more like this:

public class Diff
{
    public Diff(String path1, String path2)
    {
        // get diff

        if (same)
            throw new EmptyDiffException();
    }

    public String getDiffString()
    {

    }

    public int numHunks()
    {

    }

    public bool apply(String path1)
    {
        // try to apply diff as patch to file at path1.  Return
        // whether the patch applied successfully or not.
    }

    public bool merge(Diff diff)
    {
        // similar to apply(), but do merge yourself with another diff
    }
}

Using a diff object like this also might lend itself to things like keeping a stack of patches, or serializing to a compressed archive, maybe an "undo" queue, and so on.

Ben Collins
This isn't quite a normal diff. It's not running on strings/files, and it's not generating anything for a patch. A slightly simplified version of my problem: Making sure 2 hashtables have all the same keys, ignoring the actual values assigned to those keys. There's no applying, just calculating.
Herms
A: 

Why are you writing a class whose only purpose is to calculate the difference between two objects? That sounds like a task either for a static function or a member function of the class.

Ryan
The thing I'm diffing is sealed. Can't add it to the class. A static method could work, which would essentially be option 2.
Herms
The thing with a static method is that if you want to have extra options, you have to make ever-bigger signatures (or create a DiffOptions class. Yuk!)
slim
A: 

I would go for a static constructor method, something like.

Diffs diffs = Diffs.calculateDifferences(foo, bar);

In this way, it's clear when you're calculating the differences, and there is no way to misuse the object's interface.

daveb
Hmm. I like that, though the calculateDifferences() method would be rather complicated. I'd need a bunch of static helper methods. Probably not a big deal, but that just doesn't feel right to me.
Herms
If you wanted, calculateDifferences could return new Diffs(foo, bar);and the constructor could be private.
daveb
@Herms - private inner classes might help. Or even a private constuctor.
slim
A: 

I like the idea of explicitly starting the work rather than having it occur on instantiation. Also, I think the results are substantial enough to warrant their own class. Your first design isn't as clean to me. Someone using this class would have to understand that after performing the calculation some other class members are now holding the results. Option 2 is more clear about what is happening.

JC
+3  A: 

Use Object-Oriented Programming

Use option 2, but do not make it static.

The Strategy Pattern

This way, an instance MyObjDiffer can be passed to anyone that needs a Strategy for computing the difference between objects.

If, down the road, you find that different rules are used for computation in different contexts, you can create a new strategy to suit. With your code as it stands, you'd extend MyObjDiffer and override its methods, which is certainly workable. A better approach would be to define an interface, and have MyObjDiffer as one implementation.

Any decent refactoring tool will be able to "extract an interface" from MyObjDiffer and replace references to that type with the interface type at some later time if you want to delay the decision. Using "Option 2" with instance methods, rather than class procedures, gives you that flexibility.

Configure an Instance

Even if you never need to write a new comparison method, you might find that specifying options to tailor the behavior of your basic method is useful. If you think about using the "diff" command to compare text files, you'll remember how many different options there are: whitespace- and case-sensitivity, output options, etc. The best analog to this in OO programming is to consider each diff process as an object, with options set as properties on that object.

erickson
This is currently for a quick one-off validation app I'm doing, so I don't see it (in my specific case) being turned into a strategy pattern. But you do bring up a good point.
Herms
One question to consider: what is the advantage of using a static method? The only argument I see for static methods is that they make it easier to set up clients of the Differ API (they don't need to be given an instance somehow, they always use the same method). But that costs flexibility.
erickson
Static is mainly nice as it makes calling the function just a little easier (and *slightly* less resource intensive, not that it's really intensive anyway). You do lose a lot of flexibility though, so it's probalby not worth it in this case.
Herms
The client can always write a static wrapper to the instance invocation, if they really hate typing.
slim
There are not quick one-off validation apps. Everything lives forever. It's no more complex to have a first-class object. The "little easier to call" is of no real value.
S.Lott
+1  A: 

I can't really say I have firm reasons why it's the 'best' approach, but I usually write classes for objects that you can have a 'conversation' with. So it would be like your 'single use' option 1, except that by calling a setter, you would 'reset' it for another use.

Rather than supplying the implementation (which is pretty obvious), here's a sample invocation:

MyComparer cmp = new MyComparer(obj1, obj2);
boolean match = cmp.isMatch();
cmp.setSubjects(obj3,obj4);
List uniques1 = cmp.getOnlyIn(MyComparer.FIRST);
cmd.setSubject(MyComparer.SECOND,obj5);
List uniques = cmp.getOnlyIn(MyComparer.SECOND);

... and so on.

This way, the caller gets to decide whether they want to instantiate lots of objects, or keep reusing the one.

It's particularly useful if the object needs a lot of setup. Lets say your comparer takes options. There could be many. Set it up once, then use it many times.

// set up cmp with options and the master object
MyComparer cmp = new MyComparer();
cmp.setIgnoreCase(true);
cmp.setIgnoreTrailingWhitespace(false);
cmp.setSubject(MyComparer.FIRST,canonicalSubject);

// find items that are in the testSubjects objects,
// but not in the master.
List extraItems = new ArrayList();
for (Iterator it=testSubjects.iterator(); it.hasNext(); ) {
    cmp.setSubject(MyComparer.SECOND,it.next());
    extraItems.append(cmp.getOnlyIn(MyComparer.SECOND);
}

Edit: BTW I called it MyComparer rather than MyDiffer because it seemed more natural to have an isMatch() method than an isDifferent() method.

slim
So you're basically doing approach 1.5. The class holds all the results, but can be reused to recalculate those. Good point about the setup stuff.
Herms
Spotted a downside now that I'm thinking about it - the class will keep a reference to the inputs for as long as you need the results. That's a good reason to have a diffResults class. Adding that doesn't affect the rest of this approach.
slim
+1  A: 

You want solution #2 for a number of reasons. And you don't want it to be static.

While static seems like fun, it's a maintenance nightmare when you come up with either (a) a new algorithm with the same requirements, or (b) new requirements.

A first-class object (without much internal state) allows you to evolve into a class hierarchy of different differs -- some slower, some faster, some with more memory, some with less memory, some for old requirements, some for new requirements.

Some of your differs may wind up with complicated internal state or memory, or incremental diffing or hash-code-based diffing. All kinds of possibilities might exist.

A reusable object allows you to pick your differ at application start-up time using a properties file.

In the long run, you want to minimize the number of new operations that are scattered throughout your application. You'd like to have your new operations focused in places where you can find and control them. To change from old differ algorithm to new differ algorithm, you'd like to do the following.

  1. Write the new subclass.

  2. Update a properties file to start using the new subclass.

And be completely confident that there wasn't some hidden d= new MyObjDiffer( x, y ) tucked away that you didn't know about.

You want to use d= theDiffer.getResults( x, y ) everywhere.

What the Java libraries do is they have a DifferFactory that's static. The factor checks the properties and emits the correct Differ.

DifferFactory df= new DifferFactory();
MyObjDiffer mod= df.getDiffer();
mod.getResults( x, y );

The Factory typically caches the single copy -- it doesn't have to physically read the properties every time getDiffer is called.

This design gives you ultimate flexibility in the future. At it looks like other parts of the Java libraries.

S.Lott