nokogiri

Safe $variable binding in Nokogiri

Supposing I want to query for the XPath //*[@id=$href]. How can I tell nokogiri to safely bind a value for the $href variable? This is similar to REXML's XPath.first( node, "//*[@id=$href]", nil, {"href"=>"linktohere"}) ...

Grab Kanji webpage using Nokogiri

Hi, I would like to grab a kanji table on a Wikipedia page and I have a trouble using Nokogiri with special char. Here is my script: # -*- encoding: utf-8 -*- require 'rubygems' require 'nokogiri' require 'open-uri' link = 'http://en.wikipedia.org/wiki/List_of_j%C5%8Dy%C5%8D_kanji' doc = Nokogiri::HTML(open(link)) doc.encoding = 'U...

Trying to parse a XML using Nokogiri with Ruby

I am new to programming so bear with me. I have an XML document that looks like this: File name: PRIDE1542.xml <ExperimentCollection version="2.1"> <Experiment> <ExperimentAccession>1015</ExperimentAccession> <Title>**Protein complexes in Saccharomyces cerevisiae (GPM06600002310)**</Title> <ShortLabel>GPM06600002310</ShortL...

Handling an XML file with Ruby and Nokogiri

Hello, I am new to programming so bear with me. I have many XML documents that look like this: File name: PRIDE_Exp_Complete_Ac_10094.xml.gz <ExperimentCollection version="2.1"> <Experiment> <ExperimentAccession>1015</ExperimentAccession> <Title>Protein complexes in Saccharomyces cerevisiae (GPM06600002310)</Title> <ShortL...

difference between Nokogiri::XML(File.open()) and Nokogiri.parse(open())

I tried opening xml file using both the ways, but only the latter part worked when I tried to use xpath. eg., doc = as in title; doc.xpath('//feed/xyz'), worked only when I open the file using parse method. One thing I noted was, the object when I open using XML:: is Nokogiri::XML::Document, while the latter one was Nokogiri::HTML...

Associate an XML-Stylesheet with an XML Document with Nokogiri

Is it possible to associate a stylesheet with with Nokogiri, to create this structure? <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="http://www.my-site.com/sitemap.xsl"?&gt; <root> ... </root> ...

How can I use Ruby's Sanitize/Nokogiri to access untagged elements?

Hi all, I'm trying to build a Sanitize transformer that accepts potentially malformed HTML input with elements outside of any tags at all, such as in this example: out of a tag<p>in a tag</p>out again! I want to have the transformer wrap any non-tagged elements in <p> tags so that the above transforms into: <p>out of a tag</p><p>in ...

Why does this Nokogiri command strip out HTML tags?

Hi all, This is a continuation of a previous question. I'm having problems with this Nokogiri snippet: >> require 'nokogiri' >> html = 'bad<p>markup</p>with<img src="foo.jpg">' >> Nokogiri::HTML(html).at_css('body').children.map {|x| '<p>' + x.text + '</p>'}.join('') => "<p>bad</p><p>markup</p><p>with</p><p></p>" What happened to my...

Using Nokogiri with XML files in Ruby

Hello, Quick question: I have XML with the following code: <Experiment> <mzData version="1.05" accessionNumber="1635"> <description> <admin> <sampleName>Fas-induced and control Jurkat T-lymphocytes</sampleName> <sampleDescription> <cvParam cvLabel="MeSH" accession="D017209" name="apoptosis" /> <cvParam cvLabel="UNITY" accession="D21...

Ruby parsing HTML for CSS files

Hi, I am working with some HTML for my site, I am basically moving my site from PHP to Rails. I have literally thousands of pages and some parts of the site have different CSS files from others. I can grab the tags fine but I added some conditions for different stylesheets to be loaded if its IE6/IE7/IE8 etc. I am trying to figure o...

Preventing Nokogiri from escaping characters?

I have created a text node and inserted into my document like so: #<Nokogiri::XML::Text:0x3fcce081481c "<%= stylesheet_link_tag 'style'%>">]> When I try to save the document with this: File.open('ng.html', 'w+'){|f| f << page.to_html} I get this in the actual document: &lt;%= stylesheet_link_tag 'style'%&gt; Is there a way to di...

Nokogiri: Parsing Irregular "<"

I am trying to use nokogiri to parse the following segment <tr> <th>Total Weight</th> <td>< 1 g</td> <td style="text-align: right">0 %</td> </tr> <tr><td class="skinny_black_bar" colspan="3"></td></tr> However, I think the "<" sign in "< 1 g" is causing Nokogiri problems. Does anyone know any workarounds? Is there a...

Regex parse using Nokogiri

Using Nokogiri, I need to parse a block given: <div class="some_class"> 12 AB / 4+ CD <br/> 2,600 Dollars <br/> </div> So i need get AB, CD and Dollars values (if exist). ab = p.css(".some_class").text[....some regex....] cd = p.css(".some_class").text[....some regex....] dollars = p.css(".some_class").text[....some regex......

Searching an XML and getting a subset of the nodes as an XML

Given a search term, how to search the attributes of nodes in an XML and return the XML which contains only those nodes that match the term along with their parents all the way tracing to the root node. Here is an example of the input XML: <root> <node name = "Amaths"> <node name = "Bangles"/> </node> <node name = "C"> ...

Parsing a blogspot XML file with Nokogiri

I have a blogspot exported xml file and it looks something like this: <feed> <entry> <title> title </title> <content type="html"> Content </content> </entry> <entry> <title> title </title> <content type="html"> Content </content> </entry> </feed> How do I parse with Nokogiri and Xpath??? Here is what I have : #!/usr/bin/env ruby r...

How to tell Nokogiri when parsing a document not to convert it a different encoding (in my case not to convert &paund; to to anything else)

Question: how to tell Nokogiri when parsing a document not to convert it a different encoding (in my case not to convert to to anything else) I have a file with the following contents: <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> </head> <body> <span>&pound;</span> </body> </html> I p...

Quick Nokogiri/Ruby question

Hello, I have a quick parsing question. I have a file with the following structure <admin> <sampleName>Willow oak leaf</sampleName> <sampleDescription comment="Total genes"> <cvParam cvLabel="Bob" accession="123" name="Oak" /> </sampleDescription> </admin> I'm trying to get out the text "Total genes" after the sampleDes...

issue with a seemingly simple nokogiri xml parsing problem

Hi all, I have an xml file as follows: <products> <foundation label="New Construction"> <series label="Portrait Series" startImg="img/blank.png"> <item_container nr="1" label="Firebed"> <item next="11" id="" label="Logs Black Brick">img/PortraitSeries/logs-black-brick.png</item> ...

Read and write xml file using Nokogiri

I'm newbie to Nokogiri ruby gem. I'm wondering how to read and write back to an xml file. The requirement is that I parse xml file, make some changes, and save it. f = File.open("elevate.xml") xml = Nokogiri::XML(f) query = Nokogiri::XML::Node.new "query", xml query["text"] = "bank" query.parent = xml.root f.close This above code doe...

Nokogiri response different

Does anyone have a problem with Nokogiri acting differently between two servers (staging, and production)? On staging, it grabs and return the page properly (Nokogiri 1.4.2 Mechanize 1.0.0) On production, it returns a much smaller set of html that looks like a canned message (Nokogiri 1.4.2 Mechanize 1.0.0) I found out by running it i...