views:

141

answers:

7

I am currently developing a system for translating between data files of different types. Some of these files are just a single stream of data, some have multiple columns of data, and some have multiple channels of data (i.e. can contain multiple streams and types of data embedded). The user would be allowed to select any file type as the source, and any file type as the target, or in the case of multiple-stream sources, several file types as targets.

Since each file type has different capabilities, I am considering creating something that can examine the source file type's interface/specification, discover its available capabilities, do the same with the target file, and then automatically wire-up those capabilities that the two file types have in common.

Is there an established software pattern, open source framework, or design methodology that already does this, or something similar?

+1  A: 

Some variation of the adapter pattern may work. I'm sure you've looked into it, but I wanted to add this just in case you haven't.

http://en.wikipedia.org/wiki/Adapter_pattern

Ian P
I've had a look at some of the simple examples of the adapter pattern, but they seem too abstract to get my mind around. Will try looking for more concrete examples.
Robert Harvey
+1  A: 

I recommend that you review some of the Enterprise Integration Patterns available, especially those dealing with integrating systems with multiple interfaces. You might get some useful insights for your case there.

Regards.

StudiousJoseph
The File Transfer pattern seems to be the only one there that fits the bill. The rest of the patterns deal with messaging systems, which could be conceptually valuable, but I see potential performance problems with that approach.
Robert Harvey
+1  A: 

I am not an expert in this subject but It looks like you are trying to develop some kind of system similar to Biztalk server or might be an ETL/ELT system. An Adapter Pattern on Ends (where you need stream adapter for each stream type) would make sense, but to satisfy the core of the system, I don't think there would be any ready made patterns for the system as a whole. I would advise you look into some of the open source applications like Talend , so see how they approached this problem. As all the ETL or integration systems will have to deal with similar issue.

Bhuvan
Thanks for the link to Talend, although it may be a last resort, as it is a sea of code.
Robert Harvey
+1  A: 

Yes, I would say that Ian P's adapter pattern (http://en.wikipedia.org/wiki/Adapter_pattern) is the correct answer however some examples might be helpful: Each file type should have a converter, each converter will convert to and from a common "type" (the common type is a super set of all types so it can do anything). Then you can convert each time to the super set, and if you want to convert from the super set to another type you simply pick the type you want and the converter can convert from the super set to the type of your desires. Common examples of this pattern are: 1) all operating system drivers (device, video, printer etc.). 2) All Internet protocols (each implementation may do something different but at the end, it comes down to the protocol as the "super set type". etc..

David T
So it sounds like I need the mother of all objects as an interface, something that can represent every possible input and output. I would have been happier if you told me it was possible by combining objects from a catalog or collection of objects representing each file type.
Robert Harvey
Well, there is no one size fits all solution without a one size fits all object or interface. If you are trying to convert a file with one column to a file with 99 columns, you will need 98 defaults :-)
David T
+1  A: 

A bit of a long shot, but maybe MEF can help you to some extent?

Edit: Actually, your requirement sounds just like the Direct Show filter graph.

Rei Miyasaka
MEF is a possibility. Some of the file types do contain video, so the filter graphs could come in handy.
Robert Harvey
+1  A: 

There are two problems to solve, one is to look at a file and see which format it contains, the other is how to map inputs and outputs correctly given that I already know their capabilities.

I would solve both problems with a rule-based approach, that is a Chain Of Responsibility.

Start with the simplest case, say a single stream in input and a single stream in output. Write an object that recognizes that case and wires the system accordingly.

Then consider the next simplest case, and write an object that recognizes and solves that.

You end up with a list of "rule" objects. Your system tries all of them from the most specific to the most general. The first rule that is able to solve the current problem is applied.

xpmatteo
That might work, if I could apply more than one rule for a given file transfer.
Robert Harvey
Most rule-based systems stop after finding the first applicable rule; but it's up to you to decide what your system will do. You might apply all applicable rules. You might have more chains of rules, so that the only the first applicable rule within a chain will be applied, but more chains could be applied at the same time.
xpmatteo
+1  A: 

integration and mapping problems like this are difficult because data formats are not self-explanatory. solutions i have seen so far are very specialized compared to your idea.

if i got you right this would be an easy example: a user app sends an email with attachements (gif picture and mp3 audio) to your application and wants the message as pdf, the gif as jpeg and the mp3 as aac. in this example i ask myself why the app does not send the three files seperately, as this greatly simplifies the solution. is there a reason why the information from the user is not taken into account? maybe you can give me a better example where i see that there is no easy way to seperate the data. please, provide some more information on the processing: are you working with data or objects? what you know about the format? is everything allowed or can you restrict the type of the data (e.g. only tables)? how do you obtain the specification information?

i agree with xpmatteo so far, that there are two problem to solve. however, i see no need to compute the source file format via analysis (even if it is possible for some formats e.g. header data). the first i thought about was to send something like "source format, data". in your application you have something (e.g. map) that quickly can find an appropriate mapping/converter and possible outputs.

for the second problem you would need somthing like a meta model (for instance describing that sql data and excel sheets can both be viewed as tables) or the approach mentioned by David T (where you only look for a converter path from source to target). if all formats have something in common as the examples from David have, you may find an relatively easy model of models. otherwise it will become complex and probably unusable.

javalympics
I only have the second problem. The first problem I intend to write converters for specifically; that works because I can just turn everything into a generic stream. The second problem is a problem because it is an n factorial space; so hopefully I can find some way to prune the problem so that it grows linearly, rather than exponentially.
Robert Harvey
i need more information for suggestions. it sounds as if you have already decided many things about the solution. is the approach to put the input through every converter (and follow-up container) producing every possible output format and have a look at each generated file if it is valid?
javalympics