tags:

views:

560

answers:

2

I am using scala to load a XML file from file via the scala.xml.XML.loadFile() method. The documents I'm working with have namespaces already defined and I wish to change the namespace to something else using scala. For example, a document has a xmlns of "http://foo.com/a" with a prefix of "a" - I would like to change the namespace and prefix for the document to "http://foo.com/b" and "b" respectively.

Seems easy and I feel like I'm missing something obvious here. I do not have a problem getting the namespace from the return Elem from the referenced loadFile() method.

+4  A: 

Here it is. Since NamespaceBinding is nested (each ns has a parent, except TopScope), we need to recurse to fix that. Also, each ns has an URI and a prefix, and we need to change both.

The function below will change just one particular URI and prefix, and it will check all namespaces, to see if either prefix or URI needs changing. It will change a prefix or a URI independent of each other, which might not be what is wanted. Not a big deal fixing that, though.

As for the rest, just pattern match on Elem to recurse into each part of the XML. Ah, yes, it changes the prefix of the elements too. Again, if that's not what is wanted, it's easy to change.

The code assumes there is no need to recurse into "other" parts of XML -- the rest will usually be Text elements. Also, it assumes there is no namespace elsewhere. I'm no expert on XML, so I could be wrong on both counts. Once more, it should be easy to change that -- just follow the pattern.

def changeNS(el: Elem, 
             oldURI: String, newURI: String, 
             oldPrefix: String, newPrefix: String): Elem = {
  def replace(what: String, before: String, after: String): String =
    if (what == before) after else what

  def fixScope(ns: NamespaceBinding): NamespaceBinding =
    if(ns == TopScope)
      TopScope
    else new NamespaceBinding(replace(ns.prefix, oldPrefix, newPrefix),
                              replace(ns.uri, oldURI, newURI),
                              fixScope(ns.parent))

  def fixSeq(ns: Seq[Node]): Seq[Node] = for(node <- ns) yield node match {
    case Elem(prefix, label, attribs, scope, children @ _*) => 
      Elem(replace(prefix, oldPrefix, newPrefix), 
           label, 
           attribs, 
           fixScope(scope), 
           fixSeq(children) : _*)
    case other => other
  }
  fixSeq(el.theSeq)(0).asInstanceOf[Elem]
}

This produces an unexpected result, though. The scope is getting added to all elements. That's because NamespaceBinding does not define a equals method, thus using reference equality. I have opened a ticket for it, 2138, which has already been closed, so Scala 2.8 won't have this problem.

Meanwhile, the following code will work properly. It keeps a cache of namespaces. It also decomposes NamespaceBinding into a list before handling it.

  def changeNS(el: Elem, 
             oldURI: String, newURI: String, 
             oldPrefix: String, newPrefix: String): Elem = {
  val namespaces = scala.collection.mutable.Map.empty[List[(String, String)],NamespaceBinding]

  def replace(what: String, before: String, after: String): String =
    if (what == before) after else what

  def unfoldNS(ns: NamespaceBinding): List[(String, String)] = ns match {
    case TopScope => Nil
    case _ => (ns.prefix, ns.uri) :: unfoldNS(ns.parent)
    }

  def foldNS(unfoldedNS: List[(String, String)]): NamespaceBinding = unfoldedNS match {
    case knownNS if namespaces.isDefinedAt(knownNS) => namespaces(knownNS)
    case (prefix, uri) :: tail =>
      val newNS = new NamespaceBinding(prefix, uri, foldNS(tail))
      namespaces(unfoldedNS) = newNS
      newNS
    case Nil => TopScope
  }

  def fixScope(ns: NamespaceBinding): NamespaceBinding =
    if(ns == TopScope)
      ns
    else {
      val unfoldedNS = unfoldNS(ns)
      val fixedNS = for((prefix, uri) <- unfoldedNS) 
                    yield (replace(prefix, oldPrefix, newPrefix), replace(uri, oldURI, newURI))

      if(!namespaces.isDefinedAt(unfoldedNS))
        namespaces(unfoldedNS) = ns  // Save for future use

      if(fixedNS == unfoldedNS)
        ns
      else 
        foldNS(fixedNS)
    }

  def fixSeq(ns: Seq[Node]): Seq[Node] = for(node <- ns) yield node match {
    case Elem(prefix, label, attribs, scope, children @ _*) => 
      Elem(replace(prefix, oldPrefix, newPrefix), 
           label, 
           attribs, 
           fixScope(scope), 
           fixSeq(children) : _*)
    case other => other
  }
  fixSeq(el.theSeq)(0).asInstanceOf[Elem]
}
Daniel
A: 

Minor bug here. Attributes can also have qualified names. You need to check those as well.

MikeB