views:

243

answers:

1

I'm using nokogiri with an xml document that looks something like this:

<songs>
  <song>
    <artist>Juana Molina</artist>
    <album>Un Dia</album>
    <track>8</track>
    <title>Dar (Qu&#233; Dif&#237;cil)</title>
    <rating>5</rating>
    <filename>\Juana Molina\Un Dia\08 - Juana Molina - Dar (Qu&#233; Dif&#237;cil).mp3</filename>
  </song>
<songs>

When I try to find songs that contain escaped characters, I don't get anything back with the following xpath query (regardless of if I do or don't escape the filename string being used in the query)

file = File.new("songs.xml")
parser = Nokogiri::XML(file)

filename = "\Juana Molina\Un Dia\08 - Juana Molina - Dar (Qué Difícil).mp3"

xm = Builder::XmlMarkup.new
filename = xm.text! filename
# => \Juana Molina\Un Dia\08 - Juana Molina - Dar (Qu&#233; Dif&#237;cil).mp3

nodes = parser.root.xpath('//songs/song[filename="'+filename+'"]')
puts nodes

What is the correct way to represent escaped characters in the xpath query?

+1  A: 

It is difficult for MRI 1.8.x version to use Unicode, but The idea should be unescaped your string first.

require 'cgi'

unscaped_str = CGI.unescapeHTML(File.read('songs.xml'))

I also believer you need to run irb or ruby with -KU in order to force MRI to use unicode.

Hope this helps.

Jirapong