views:

286

answers:

3

I have a pipeline-based application that analyzes text in different languages (say, english and chinese). My goal is to have a system that can work on both languages, in a transparent way. NOTE: This question is long because it has many simple code snippets.

The pipeline is composed of three components (lets call them A, B, and C), and I've created them in the following way, so that the components are not tightly coupled:

public class Pipeline {
    private A componentA;
    private B componentB;
    private C componentC;

    // I really just need the language attribute of Locale,
    // but I use it because it's useful to load language specific ResourceBundles.
    public Pipeline(Locale locale) {
        componentA = new A();
        componentB = new B();
        componentC = new C();
    }

    public Output runPipeline(Input) {
        Language lang = LanguageIdentifier.identify(Input);
        //
        ResultOfA resultA = componentA.doSomething(Input);
        ResultOfB resultB = componentB.doSomethingElse(resultA); // uses result of A
        return componentC.doFinal(resultA, resultB); // uses result of A and B
    }
}

Now, every component of the pipeline has something inside which is language specific. For example, in order to analyze chinese text, I need one lib, and for analyzing english text, I need another different lib.

Moreover, there are some tasks that can be done in one language, and cannot be done on the other. One solution to this problem is to make every pipeline component abstract (to implement some common methods), and then have a concrete language specific implementation. Exemplifying with component A, I'd have the following:

public abstract class A {
    private CommonClass x;  // common to all languages
    private AnotherCommonClass y; // common to all languages

    abstract SomeTemporaryResult getTemp(input); // language specific
    abstract AnotherTemporaryResult getAnotherTemp(input); // language specific

    public ResultOfA doSomething(input) {
          // template method
          SomeTemporaryResult t = getTemp(input); // language specific
          AnotherTemporaryResult tt = getAnotherTemp(input); // language specific
          return ResultOfA(t, tt, x.get(), y.get());
    }
}

public class EnglishA extends A {
    private EnglishSpecificClass something;
    // implementation of the abstract methods ... 
}

In addition, since each pipeline component is very heavy and I need to reuse them, I thought of creating a factory that caches up the component for further use, using a map that uses the language as the key, like so (the other components would work in the same manner):

public Enum AFactory {
    SINGLETON;

    private Map<String, A> cache; // this map will only have one or two keys, is there anything more efficient that I can use, instead of HashMap ?

    public A getA(Locale locale) {
        // lookup by locale.language, and insert if it doesn't exist, et cetera
        return cache.get(locale.getLanguage());
    }
}

So, my question is: What do you think of this design? How can it be improved? I need the "transparency" because the language can be changed dynamically, based on the text that it's being analyzed. As you can see from the runPipeline method, I first identify the language of the Input, and then, based on this, I need to change the pipeline components to the identified language. So, instead of invoking the components directly, maybe I should get them from the factory, like so:

public Output runPipeline(Input) {
    Language lang = LanguageIdentifier.identify(Input);
    ResultOfA resultA = AFactory.getA(lang).doSomething(Input);
    ResultOfB resultB = BFactory.getB(lang).doSomethingElse(resultA);
    return CFactory.getC(lang).doFinal(resultA, resultB);
}

Thank you for reading this far. I very much appreciate every suggestion that you can make on this question.

+1  A: 

I like the basic design. If the classes are simple enough, I might consider consolidating the A/B/C factories into a single class, as it seems there could be some sharing in behavior at that level. I'm assuming that these are really more complex than they appear, though, and that's why that is undesirable.

The basic approach of using Factories to reduce coupling between components is sound, imo.

jsight
A: 

If I'm not mistaken, What you are calling a factory is actually a very nice form of dependency injection. You are selecting an object instance that is best able to meet the needs of your parameters and return it.

If I'm right about that, you might want to look into DI platforms. They do what you did (which is pretty simple, right?) then they add a few more abilities that you may not need now but you may find would help you later.

I'm just suggesting you look at what problems are solved now. DI is so easy to do yourself that you hardly need any other tools, but they might have found situations you haven't considered yet. Google finds many great looking links right off the bat.

From what I've seen of DI, it's likely that you'll want to move the entire creation of your "Pipe" into the factory, having it do the linking for you and just handing you what you need to solve a specific problem, but now I'm really reaching--my knowledge of DI is just a little better than my knowledge of your code (in other words, I'm pulling most of this out of my butt).

Bill K
Thanks for the comments. The problem with DI is that I need the pipeline (and the components) to be changed at runtime. For example, I take a sentence as input; I do some analysis on it to detect its language; and then I need to get the language specific components of the pipeline (I probably need to make Pipeline an interface, and have language specific versions of it to simplify the "switch"). From what I've read of DI, the idea is to configure the dependencies externally (e.g., .xml), and have them "injected" in away that makes it unfeasible to switch at runtime.
JG
+1  A: 

The factory idea is good, as is the idea, if feasible, to encapsulate the A, B, & C components into single classes for each language. One thing that I would urge you to consider is to use Interface inheritance instead of Class inheritance. You could then incorporate an engine that would do the runPipeline process for you. This is similar to the Builder/Director pattern. The steps in this process would be as follows:

  1. get input
  2. use factory method to get correct interface (english/chinese)
  3. pass interface into your engine
  4. runPipeline and get result

On the extends vs implements topic, Allen Holub goes a bit over the top to explain the preference for Interfaces.


Follow up to you comments:

My interpretation of the application of the Builder pattern here would be that you have a Factory that would return a PipelineBuilder. The PipelineBuilder in my design is one that encompases A, B, & C, but you could have separate builders for each if you like. This builder then is given to your PipelineEngine which uses the Builder to generate your results.

As this makes use of a Factory to provide the Builders, your idea above for a Factory remains in tact, replete with its caching mechanism.

With regard to your choice of abstract extension, you do have the choice of giving your PipelineEngine ownership of the heavy objects. However, if you do go the abstract way, note that the shared fields that you have declared are private and therefore would not be available to your subclasses.

akf
JG
On the extends versus implements issue, I also read that article, and though it's a nice read, I think the `Collections` examples somehow miss the point, but I get the problem. However, in my particular case, I do have some heavy objects that need to be shared among every language specific component, and some common methods that operate on them, hence the the ``abstract`` class.
JG