tags:

views:

72

answers:

2

Is there a generic approach to data merge an xml file content (a template) with embedded XPath expression to an XmlDocument?

As an example, (please note this is just a simple example, i am looking for a generic approach)

File:

<root xmlns:dt="urn:schemas-microsoft-com:datatypes">
<session email='' alias=''>
 <state>
  <action>
   <attribute  in_var="" out_var="" entity_name="entity" query_name="query1"/>
   <attribute dtype="string" in_var=""  name="entity_id" value="$/data/row/entity_id$"/>
  </action>
 </state>
</session>

XmlDocument:

<data>
  <row>
    <entity_id>1</entity_id>
    <entity_name>Entity 1</entity_name>
  </row>
  <row>
    <entity_id>2</entity_id>
    <entity_name>Entity 2</entity_name>
  </row>
</data>

After Merge:

    <root xmlns:dt="urn:schemas-microsoft-com:datatypes">
<session email='' alias=''>
 <state>
  <action>
   <attribute  in_var="" out_var="" entity_name="entity" query_name="query1"/>
   <attribute dtype="string" in_var=""  name="entity_id" value="1"/>
  </action>
 </state>
</session>

    <root xmlns:dt="urn:schemas-microsoft-com:datatypes">
<session email='' alias=''>
 <state>
  <action>
   <attribute  in_var="" out_var="" entity_name="entity" query_name="query1"/>
   <attribute dtype="string" in_var=""  name="entity_id" value="2"/>
  </action>
 </state>
</session>

I was under the impression that regular expression backreferences can assist in this scenario but I have hit a dead end.

A: 

The fact that your template contains $/xpath/expression$ strings pretty much rules out the possibility to solve this in XSLT alone - XPath expressions cannot be evaluated dynamically, plus the expressions you have do not recognize the concept of rows/records.

Also I don't know of a generic/widespread way to solve it. I would probably solve it with an approach similar to this:

  • read the XML template file into a DOM, the XML data file into another DOM
  • look for XPath expression placeholders. For example, if they are in with attributes:
    //@*[starts-with(., '$') and ends-with(., '$')]
  • pull out all the XPath-expression strings and apply them to the data file, storing the results in a temporary data structure.

Say, your template contained these patterns:

  • "$/data/row/entity_id$"
  • "$/data/row/entity_name$"

then I would start by making a result set for each expression (pseudo JS code):

var placeholderData = {
  "$/data/row/entity_id$": ["1", "2"],
  "$/data/row/entity_name$": ["Entity 1", "Entity 2"]
};

Then, I would make a loop over the <row>s (pseudo code, again):

var rows = dataXml.selectNodes("/data/row");
var placeholderXpath = "//@*[starts-with(., '$') and ends-with(., '$')]";

for (var i = 0; i < rows.length; i++)
{
  var currentTemplate = templateXml.copy();
  var attributeNode = null;
  foreach (attributeNode in currentTemplate.selectNodes(placeholderXpath))
  {
    var expression = attributeNode.text;
    if (placeholderData[expression].length > i)
      attributeNode.text = placeholderData[expression][i];
    else
      attributeNode.text = "";
  }
  currentTemplate.saveAs("output_" + i + ".xml");
}

If the "$/xpath/expression/$" placeholders can show up pretty much anywhere (instead of attribute values alone), the whole thing gets a bit more complicated of course. The general approach would probably still work.

Tomalak
Depending on the environment (and the freasibility of using non-portable features it provides), evaluating XPath might not be a problem - EXSLT has `dyn:evaluate`, Saxon has something similar, and .NET `XslCompiledTransform` can use `msxsl:script` blocks that are written in C#/VB, and can delegate to `XPathNavigator` to evaluate in essentially one line.
Pavel Minaev
+1  A: 

It's an interesting problem. I assume that $/some/path/$ will always be replaced with the value of the elements returned by an XPath query? I think the "File" must be processed as a string. Yes, it's an XML, but if that pattern holds true, it's much simpler this way. It's just a macro-substitution then.

In that case, one solution would be (Scala script):

import scala.xml.{Node, NodeSeq}

val pattern = """\$([\w/]*)\$""".r
def patterns(s: String) = (pattern findAllIn s matchData) map (_ group 1) toList
def pathComponents(path: String) = (path split """\b(?!\w)""" toList) map (_ split "\\b" toList)
def lookUp(xml: Node, path: List[List[String]]) = {
  path.foldLeft(xml : NodeSeq) { (nodes, pathComponent) =>
    pathComponent match {
      case List("/", component) => nodes \ component
      case List("//", component) => nodes \\ component
      case _ => throw new IllegalArgumentException
    }
  } map (_ text)
}
def pathAndValues(s: String, xml: Node) = {
  patterns(s) map (path => (path -> lookUp(xml, pathComponents(path))))
}
def merge(s: String, xml: Node) = {
  pathAndValues(s, xml).foldLeft(List(s)) { (files, tuple) =>
    val (path, values) = tuple
    for (file <- files;
         value <- values)
    yield file replace ("$"+path+"$", value)
  }
}

You then read XmlDocument into xml, and the file to be merged into String. This, of course, asssumes the file is not too big to be handled this way. In Scala, it could be done like this:

merge(scala.io.Source.fromFile(filename).getLines.mkString,
      scala.xml.XML.loadFile(XmlDocumentFilename))

That will return a list with every permutation possible for each substitution.

If these files are too big to keep in memory, it will be necessary to generate each possible permutation for the values to be substituted, so that you need one pass only to replace all paths for each permutation.

If the XPaths are true XPaths, and not just limited to "/" and "//", this solution won't do as is. It will have to be converted to use a true XPath library. Also, note that "/" looks for a child, so if <data> is the root, /data won't work.

Daniel