tags:

views:

83

answers:

1

Now that I know how to parse xml in scala as a stream I need help understanding a non-trivial example.

I'd like to parse the following xml as a stream and send a message (print to console for this example) whenever I've parsed out a full message.

I understand that stream based parsing in scala uses case classes to handle the different elements, but I'm just getting started and I don't quite understand how to do this.

I have this working in java using a stax parser, and I'm trying to translate that into scala.

Any help would be greatly appreciated.

<?xml version="1.0" ?>
<messages>
<message>
   <to>[email protected]</to>
   <from>[email protected]</from>
   <subject>Hi Nice</subject>
   <body>Hello this is a truly nice message!</body>
</message>
<message>
   <to>[email protected]</to>
   <from>[email protected]</from>
   <subject>Hi Nice</subject>
   <body>Hello this is a truly nice message!</body>
</message>
</messages>
+4  A: 

This is for 2.8.

The typical way to process events is to use a match statement. In my case, i always had the need to store the parents as I process elements (to know for instance in what tag the text is located):

import scala.xml.pull._
import scala.io.Source
import scala.collection.mutable.Stack

val src = Source.fromString(xml)
val er = new XMLEventReader(src)
val stack = Stack[XMLEvent]()
def iprintln(s:String) = println((" " * stack.size) + s.trim)
while (er.hasNext) {
  er.next match {
    case x @ EvElemStart(_, label, _, _) =>
      stack push x
      iprintln("got <" + label + " ...>")
    case EvElemEnd(_, label) => 
      iprintln("got </" + label + ">")
      stack pop;
    case EvText(text) => 
      iprintln(text) 
    case EvEntityRef(entity) => 
      iprintln(entity) 
    case _ => // ignore everything else
  }
}

Because entity are events, you will probably need to convert to text and combine them with the surrounding text.

In the example above I only used label, but you can also use EvElemStart(pre, label, attrs, scope) to extract more stuff and you can add an if guard to match for complex conditions.

Also if you're using 2.7.x, I don't know if http://lampsvn.epfl.ch/trac/scala/ticket/2583 was back-ported so, you may have issues to process text with entities.

More to the point, just dealing with from and to for brevity (though I would not call that the Scala way):

class Message() {
  var to:String = _
  var from:String = _
  override def toString(): String = 
    "from %s to %s".format(from, to)
}

var message:Message = _
var sb:StringBuilder = _

while (er.hasNext) {
  er.next match {
    case x @ EvElemStart(_, "message", _, _) =>
      message = new Message
    case x @ EvElemStart(_, label, _, _) if
        List("to", "from") contains label =>
      sb = new StringBuilder 
    case EvElemEnd(_, "to") => 
      message.to = sb.toString
    case EvElemEnd(_, "from") => 
      message.from = sb.toString
      sb = new StringBuilder 
    case EvElemEnd(_, "message") => 
      println(message)
    case EvText(text) if sb != null => 
      sb ++= text
    case EvEntityRef(entity) => 
      sb ++= unquote(entity) // todo
    case _ => // ignore everything else
  }
}
huynhjl
This is helpful, but still not quite what I'm looking for. I might be able to get something working using this.
ScArcher2
I'm trying to discover the "scala" way to build out a "message" object, or possibly even just a tuple and then print that out all at once instead of just printing when i encounter each tag and text element.
ScArcher2
Thank you for the example! This is exactly the information I was looking for. I am still interested in the "scala" way of doing it if there is a more functional way. Thanks again!
ScArcher2