tags:

views:

75

answers:

2

Hello

I need to do a pattern in Scala, this is a code:

object Wykonaj{

val doctype = DocType("html", PublicID("-//W3C//DTD XHTML 1.0 Strict//EN","http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"), Nil)

  def main(args: Array[String]) {
  val theUrl = "http://axv.pl/rss/waluty.php"
  val xmlString = Source.fromURL(new URL(theUrl)).mkString
  val xml = XML.loadString(xmlString)
   val zawartosc= (xml \\ "description")
 val pattern="""<descrition> </descrition>""".r 
 for(a <-zawartosc) yield a match{
 case pattern=>println(pattern) 
 }
     }     
}

The problem is, I need to do val pattern=any pattern, to get from

<description><![CDATA[ <img src="http://youbookmarks.com/waluty/pic/waluty/AUD.gif"&gt; dolar australijski 1AUD | 2,7778 | 210/A/NBP/2010 ]]> </description>

only it dolar australijski 1AUD | 2,7778 | 210/A/NBP/2010.

+1  A: 
val zawartosc = (xml \\ "description")
val pattern = """.*(dolar australijski.*)""".r 
val allMatches = (for (a <- zawartosc; text = a.text) yield {text}) collect { 
   case pattern(value) => value }
val result = allMatches.headOption // or .head 

This is mostly a matter of using the right regular expression. In this case you want to match the string that contains dolar australijski. It has to allow for extra characters before dolar. So use .*. Then use the parens to mark the start and end of what you need. Refer to the Java api for the full doc.

With respect to the for comprehension, I convert the XML element into text before doing the match and then collect the ones that match the pattern by using the collect method. Then the desired result should be the first and only element.

huynhjl
A: 

Try

import scala.util.matching.Regex

//...

val Pattern = new Regex(""".*; ([^<]*) </description>""")

//...

for(a <-zawartosc) yield a match {
  case Pattern(p) => println(p)
}

It's a bit of a kludge (I don't use REs with Scala very often), but it seems to work. The CDATA is stringified as &gt; entities, so the RE tries to find text after a semicolon and before a closing description tag.

Hoodiecrow