views:

92

answers:

2

Hello,

I have a quick question. I am currently writing a Nokogiri/Ruby script and have the following code:

fullId = doc.xpath("/success/data/annotatorResultBean/annotations/annotationBean/concept/fullId")
fullId.each do |e|
            e = e.to_s()
            g.write(e + "\n")
    end

This spits out the following text:

<fullId>D001792</fullId>
<fullId>D001792</fullId>
<fullId>D001792</fullId>
<fullId>D008715</fullId>

I wanted the just the numbers text in between the "< fullid>" saved, without the < fullId>,< /fullId> markup. What am I missing?

Bobby

A: 

Something like this maybe:

e = e.to_s()[/>([^<]*)</][$1]

or

e.to_s()[/>([^<]*)</]
e = $1

depending on which you prefer

bjg
So that work perfect, thanks. I am pretty new at programming, and I would like to know how exactly does that line of code work.
Bobby
That line of code is using a regular expression (in Ruby, regular expressions are delimited by //) to extract all of the text between the > and < characters.
Greg Campbell
The `String` class `[]` method is really just `slice`. It can take a regular expression (i.e. `/../`) and tries to extract a matching substring. The `(..)` inside is called a capture of which you can have several in a single RE. Ruby assigns $1, $2, etc to the captured values in turn. The first variant above takes advantage of the fact that `[]` can also take a substring which will match if it occurs inside the string. By definition if $1 has a value then it matches. The second variant is faster because it only slices once.
bjg
+9  A: 

I think you want to use the text() accessor (which returns the child text values), rather than to_s() (which serializes the entire node, as you see here).

I'm not sure what the g object you're calling write on is, but the following code should give you an array containing all of the text in the fullId nodes:

doc.xpath(your_xpath).map {|e| e.text}
Greg Campbell
+1 for pointing to the most correct answer
Chubas