tags:

views:

311

answers:

2

How do I parse an xml document as a stream using Scala? I've used the Stax API in java to accomplish this, but I'd like to know if there is a "scala" way to do this.

+1  A: 
scala.xml.XML.loadFile(fileName: String)
scala.xml.XML.load(url: URL)
scala.xml.XML.load(reader: Reader)
scala.xml.XML.load(stream: InputStream)

There are others...

Randall Schulz
Won't that load the entire document into memory? I'd like to parse it and handle it as I receive it and not store the entire document in memory first.
ScArcher2
@ScArcher2: Yes, those are not streaming APIs.
Randall Schulz
+11  A: 

Use package scala.xml.pull. Snippet taken from the Scaladoc for Scala 2.8:

import scala.xml.pull._
import scala.io.Source
object reader {
  val src = Source.fromString("<hello><world/></hello>")
  val er = new XMLEventReader(src)
  def main(args: Array[String]) {
    while (er.hasNext)
      Console.println(er.next)
  }
}

You can call toIterator or toStream on er to get a true Iterator or Stream.

And here's the 2.7 version, which is slightly different. However, testing it seems to indicate it doesn't detect the end of the stream, unlike in Scala 2.8.

import scala.xml.pull._
import scala.io.Source

object reader {
  val src = Source.fromString("<hello><world/></hello>")
  val er = new XMLEventReader().initialize(src)

  def main(args: Array[String]) {
    while (er.hasNext)
      Console.println(er.next)
  }
}
Daniel
I had to change my code to val er = new XMLEventReader().initialize(src), but it looks like it's working. Thanks!
ScArcher2
the 2.7 xml.pull is savagely borked and should be avoided entirely by all but the very brave. 2.8 situation is much improved.
Seth Tisue