views:

309

answers:

3

Given a scala.xml.Node object (with white space and elements as child nodes) what's the most efficient way of getting the second (or n-th) child element?

Usually I would go for the built-in (node \ "foo"), but sometimes I have to rely on the position of the element. For example, I could have two Choice groups that could be either foo or bar. The document could be

<something>
  <foo/>
  <foo/>
</something>

or

<something>
  <foo/>
  <bar/>
</something>

etc.

+1  A: 

What I have so far is:

node.child.filter(_.isInstanceOf[scala.xml.Elem])(1)
eed3si9n
+1  A: 

Get the second element named "foo", or None if not found:

(xml \ "foo").drop(1).headOption

Or, more efficiently in case of large XML structures:

xml.child.toStream.partialMap { 
   case e: xml.Elem if e.label == "foo" => e
}.drop(1).headOption

(This is with Scala 2.8)

UPDATE

To get the second, regardless of name:

 (xml \ "_") drop(1) headOption
retronym
Thanks for your answer. Just for clarification, as @huynhjl wrote, I am interested in the second child element, not the second instance of foo.
eed3si9n
+3  A: 

I like retronym's drop(n).headOption pattern as it accounts for when you have less children than n. But I think you meant the second child node (excluding text nodes), not the second instance of the <foo> tag. With that in mind, combining with your answer or using partialMap:

node.child.partialMap{case x:scala.xml.Elem => x}.drop(n).headOption

node.child.filter(_.isInstanceOf[scala.xml.Elem]).drop(n).headOption

This has to assume that you won't want to extract text in:

val node = <something><foo/>text</something>

Efficiency wise, the only other thing I could think of is to make filter lazy if you wanted to retrieve the second child when there are a large number of children. I think this may be achieved by running filter on node.child.iterator instead.

Edit: Changed toIterable to iterator. good point, calling drop(n) on an ArrayBuffer will cause additional allocations, also how many is hard to tell, since it seems drop is overridden in IndexSeqLike. But using the iterator would address that too. So for large number of children:

node.child.iterator.filter(_.isInstanceOf[scala.xml.Elem]).drop(n).next

If you want to have it be safe, you may need to define a function to check for hasNext.

All of this is tested only in 2.8.

huynhjl
So drop(n).headOption buys me safety, but not efficiency? Since child returns ArrayBuffer, making it iterable avoids filtering cost only, right?
eed3si9n