I have a pipeline-based application that analyzes text in different languages (say, english and chinese). My goal is to have a system that can work on both languages, in a transparent way. NOTE: This question is long because it has many simple code snippets.
The pipeline is composed of three components (lets call them A, B, and C), and I've created them in the following way, so that the components are not tightly coupled:
public class Pipeline {
private A componentA;
private B componentB;
private C componentC;
// I really just need the language attribute of Locale,
// but I use it because it's useful to load language specific ResourceBundles.
public Pipeline(Locale locale) {
componentA = new A();
componentB = new B();
componentC = new C();
}
public Output runPipeline(Input) {
Language lang = LanguageIdentifier.identify(Input);
//
ResultOfA resultA = componentA.doSomething(Input);
ResultOfB resultB = componentB.doSomethingElse(resultA); // uses result of A
return componentC.doFinal(resultA, resultB); // uses result of A and B
}
}
Now, every component of the pipeline has something inside which is language specific. For example, in order to analyze chinese text, I need one lib, and for analyzing english text, I need another different lib.
Moreover, there are some tasks that can be done in one language, and cannot be done on the other. One solution to this problem is to make every pipeline component abstract (to implement some common methods), and then have a concrete language specific implementation. Exemplifying with component A, I'd have the following:
public abstract class A {
private CommonClass x; // common to all languages
private AnotherCommonClass y; // common to all languages
abstract SomeTemporaryResult getTemp(input); // language specific
abstract AnotherTemporaryResult getAnotherTemp(input); // language specific
public ResultOfA doSomething(input) {
// template method
SomeTemporaryResult t = getTemp(input); // language specific
AnotherTemporaryResult tt = getAnotherTemp(input); // language specific
return ResultOfA(t, tt, x.get(), y.get());
}
}
public class EnglishA extends A {
private EnglishSpecificClass something;
// implementation of the abstract methods ...
}
In addition, since each pipeline component is very heavy and I need to reuse them, I thought of creating a factory that caches up the component for further use, using a map that uses the language as the key, like so (the other components would work in the same manner):
public Enum AFactory {
SINGLETON;
private Map<String, A> cache; // this map will only have one or two keys, is there anything more efficient that I can use, instead of HashMap ?
public A getA(Locale locale) {
// lookup by locale.language, and insert if it doesn't exist, et cetera
return cache.get(locale.getLanguage());
}
}
So, my question is: What do you think of this design? How can it be improved? I need the "transparency" because the language can be changed dynamically, based on the text that it's being analyzed. As you can see from the runPipeline
method, I first identify the language of the Input, and then, based on this, I need to change the pipeline components to the identified language. So, instead of invoking the components directly, maybe I should get them from the factory, like so:
public Output runPipeline(Input) {
Language lang = LanguageIdentifier.identify(Input);
ResultOfA resultA = AFactory.getA(lang).doSomething(Input);
ResultOfB resultB = BFactory.getB(lang).doSomethingElse(resultA);
return CFactory.getC(lang).doFinal(resultA, resultB);
}
Thank you for reading this far. I very much appreciate every suggestion that you can make on this question.