views:

175

answers:

3

During our work as web developer for a meteorological company, we are faced with the same task over and over again: Get some files from somewhere (FTP/Web/directory/mail) and import the contained data to a database.

Of course the file format is never the same, the databases are always designed differently, countless special cases have to be handled, etc, etc.

So now I'm planning an importing framework for exactly this kind of work. Since we're all experienced PHP developers and the current scripts are either PHP or Perl, we'll stick with PHP as scripting language.

  • A data getter will fetch the file from the source, open it and store the content into a string variable. (Don't worry, PHP will get enough memory from us.)
  • The data handler will do the complicated work to convert the string into some kind of array.
  • The array will be saved to the database or written to a new file or whatever we're supposed to do with it.

Along with this functionality there will be some common error handling, log writing and email reporting.

The idea is to use a collection of classes (Some getter-classes, a lot of specialised handlers, some writer classes).

My question: How do I practically organize these classes in a working script? Do I invent some kind of meta language which will be interpreted and the the classes are called accordingly? Or just provide some simple interfaces these classes have to implement and the my users (like I said: Experienced PHP developers) will write small PHP scripts loading these classes?

The second version almost certainly offers the biggest flexiblity and extensibility.

Do you have any other ideas concerning such an undertaking?

+2  A: 

I suggest borrowing concepts from Data Transformation Services (DTS). You could have data sources and data sinks, import tasks, transformation tasks and so on.

Pavel Chuchuva
Thanks for the hint, I'll read up on it. And it pointed me into the right direction for an acronym: ETL (Extract, Transform, Load).
christian studer
+1  A: 

Working in a similar environment of dozens of different external data formats that need to be im- and exported, I can recommend to at least try and get them to unify the data formats. We had some success by developing tools that help others outside our company to transform their data into our format. We also gave them the source code, for free.

Some others are now transforming their data for us using our tools, and if they change their format, it is them that changes the transformation tool. One cause of a headache less for us.

In one case it even lead to another company switching to the file format our systems use internally. Granted, it is only one case, but I consider it a first step on a long road ;-)

Treb
christian studer
A: 

Is there a reason why defining a standard web service wouldn't work here? Then you can supply the data in a standard format, returning a SOAP error (possibly populated by a field in the input document) if there's a fault.

It's potentially more limited than Pavel's suggestion (or would require more up-front design), but might be something worth considering.

Robert Grant
Like I noted as a commend to Treb: We can't demand too much from our data suppliers and are in a position of weakness towards them. :-( Or they simply don't have the knowledge to send Matlab or IDL output to a web service. Pragmatically, we'll have to do the work on our side.
christian studer