I have a task at work that involves converting legacy SGM files into XML. The SGM files were created using 5 separate high level tags, the new DTD has about 8-12 top level tags that the old ones would need to be mapped to. There are some common tags between the 2 DTDs but there are enough differences that it doesn't make sense to just do manual copy and paste of data between the 2 DTDs.

In addition, there is linking information that needs to be translated between the legacy format into the newer format. I am currently leaning towards the following high level approach.

  1. Convert SGM to well formed XML
  2. Read in the XML files and create a mapping template for existing file types into the new file type. Fields for metadata will be used for each file, with defaults being used for the majority of the values. This file will be used to drive the final conversion into the target XML. I want to have a tool here is fairly bullet proof for data entry and uses drop down lists for the choices for the meta data so I am looking at the creation of a desktop application.
  3. Do a conversion of the XML using XSLT

I am curious if anyone else has experience with this type of conversion, does this high level approach seem viable, are there other ways to view this problem. Because of time limitations for myself I am looking at hiring another developer to do coding for this project. I have used XSLT but do not have recent experience with desktop application development and what languages provide a good interface to XSLT and can provide a good front end experience for the end user.

Appreciate whatever help and comments people can provide. Will be glad to provide further clarification on what I am looking for.

+1  A: 

That is precisely how I would do it. You are really doing three different things here: Convert from SGML to XML, convert from XML to a different schema, and mix in new data. So doing it in three separate steps is the right way to do it.

Peter Eisentraut
It's good to have some confirmation that I'm not completely off base with my approach. My next challenge is deciding on what implementation language to use for the mapping and driving the XML conversion. If I were doing the work I'd somehow use PHP at a command line but I need something more robust for other people to use. Will have to do some more research on the languages and skill sets available for working on this type of problem.
A tool such as sx (sometimes called osx or sgml2xml) that can do the conversion, but it messed up the formatting of the files, so you can't reasonably hand-edit them afterwards. But since you plan to convert them to a different XML schema afterwards anyway (XSLT?), then this shouldn't matter.
Peter Eisentraut