nokogiri

Nokogiri/Ruby array question

Hello, I have a quick question. I am currently writing a Nokogiri/Ruby script and have the following code: fullId = doc.xpath("/success/data/annotatorResultBean/annotations/annotationBean/concept/fullId") fullId.each do |e| e = e.to_s() g.write(e + "\n") end This spits out the following text: <fullId>D00...

How to extract attribute name and value pair from xml using Nokogiri?

Example: <fruit name="mango"/> I want to get output as: name="mango" ...

Any Ruby models to traverse DOM's quickly?

Does anyone know of any Ruby libraries/gems that allow you to traverse a DOM quickly? I need something which is fast, and doesn't have a lot of dependencies. I've been trying to use Nokogiri, but I'm concerned with the number of 'bug segmentation faults' I've been getting. ...

I'm trying to extract each a href link on an html page for evaluation w/ nokogiri and xpath

I'm trying to extract each a href link on an html page for evaluation w/ nokogiri and xpath. What I have so far seems to be pulling the page titles out only. I'm not interested in the link title, but rather just the URL that is being pointed to. Here's what I have: doc = Nokogiri::HTML(open("http://www.cnn.com")) doc.xpath('//a').each...

Nokogiri html parsing question

Hello, I'm having trouble figuring out why I can't get keywords to parse properly through nokogiri. In the following example, I have the a href link text functionality working properly but cannot figure out how to pull the keywords. This is the code I have thus far: ..... doc = Nokogiri::HTML(open("http://www.cnn.com")) doc.xpath('/...

Find and replace entire HTML nodes with Nokogiri

Hi, i have an HTML, that should be transformed, having some tags replaced with another tags. I don't know about these tags, because they will come from db. So, "set_attribute" or "name" methods of Nokogiri are not suiteable for me I need to do it, in a way, like in this pseudo-code: def preprocess_content doc = Nokogiri::HTML( self...

Strange problems with Nokogiri

Say, we have an HTML, in which, all ... <div class="replace-me"> </div> ... must be replaced with <video src='my_video.mov'></video> The code is following: doc.css("div.replace-me").each do |div| div.replace "<video src='my_video.mov'></video>" end It's simple, but, unfortunately, it does't work for me. Nokogiri crashes with f...

Use nokogiri to split content on a <br> element

Hi, How can I use nokogiri to split the following HTML into text nodes? I want to somehow split the content by using the <br/> tag as a delimiter or sadly an unclosed <br> which is often the case in the HTML I am scraping. An example of the html would be: <td> <font size="2" face="Arial"><b>HALL (J&amp;E) LTD</b><br> ...

How to split a HTML document using nokogiri?

Right now, splitting the HTML document to small pieces like this: (regular expression simplified - skipping header tag content and closing tag) document.at('body').inner_html.split(/<\s*h[2-6][^>]*>/i).collect do |fragment| Nokogiri::HTML(fragment) end Is there more easy way to perform that splitting? The document is very simple, j...

Function 'xsltParseStylesheetDoc' not found in [libxml2.so]

This error comes up in Redhat Enterprise Linux Server 5.4 - 64 bit. Linux rhl-64-tibbr5 2.6.18-164.el5 #1 SMP Tue Aug 18 15:51:48 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux There is also this error in the stack trace. uninitialized constant Nokogiri::VERSION_INFO More version details: jruby-1.4.0RC1 ruby/gems/1.8/gems/activesupport-2....

Can nokogiri search for "?xml-stylesheet" tags ?

I need to parse for the an xml style sheet <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="/templates/xslt/inspections/disclaimer_en.xsl"?> Using nokogiri I have tried using doc.search("?xml-stylesheet").first['href'] but I get the error `on_error': unexpected '?' after '' (Nokogiri::CSS::SyntaxErro...

Nokogiri and XPath help

Admittedly, I'm a Nokogiri newbie and I must be missing something... I'm simply trying to print the author > name node out of this XML: <?xml version="1.0" encoding="UTF-8"?> <entry xmlns:gd="http://schemas.google.com/g/2005" xmlns:docs="http://schemas.google.com/docs/2007" xmlns="http://www.w3.org/2005/Atom" gd:etag=""> <category te...

using xpath on single Nokogiri node returns elements in all nodes

I am parsing an XML doc that looks something like this: <MyBook> <title>Favorite Poems</title> <issn>123-456</issn> <pages>45</pages> </MyBook> <MyBook> <title>Chocolate Desserts</title> <issn>654-098</issn> <pages>100</pages> </MyBook> <MyBook> <title>Jabberwocky</title> <issn>454-545</issn> <pages>19</pages>...

select tr>3 with nokogiri

i want to get row which it contains more than 3 columns how to write xpath with nokogiri require 'rubygems' require 'nokogiri' item='sometext' doc = Nokogiri::HTML.parse(open(item)) data=doc.xpath('/html/body/table/tr[@td.size>3]') puts data it can not run , help and advices appreciated. ...

Using Nokogiri and XPath to get nodes with multiple attributes

I'm trying to use Nokogiri to parse an HTML file with some fairly eccentric markup. Specifically, I'm trying to grab divs which have both ids, multiple classes and styles defined. The markup looks something like this: <div id="foo"> <div id="bar" class="baz bang" style="display: block;"> <h2>title</h2> <dl> List of stu...

How to add attribute to Nokogiri node?

I'm trying to add an attribute to an existing Nokogiri node. What I've done is this: node.attributes['foobar'] = Nokogiri::XML::Attr.new('foo', 'bar') But I get the error: TypeError Exception: wrong argument type String (expected Data) What is a Data data type, and how do I add an attribute to the Nokogiri object? Thanks! ...

How should I refactor this?

I would like to collect and store all this info into an array. I have the following, how should I refactor this? require 'rubygems' require 'nokogiri' require 'open-uri' @urls = %w{http://url_01.com http://url_02.com http://url_03.com} @link_01_arr = [] @link_02_arr = [] @link_03_arr = [] link_01 = Nokogiri::HTML(open("#{@urls[0]}"...

Use Nokogiri to get all nodes in an element that contain a specific attribute name

Hi, I'd like to use Nokogiri to extract all nodes in an element that contain a specific attribute name. e.g., I'd like to find the 2 nodes that contain the attribute "blah" in the document below. @doc = Nokogiri::HTML::DocumentFragment.parse <<-EOHTML <body> <h1 blah="afadf">Three's Company</h1> <div>A love triangle.</div> <b b...

Can any of Ruby's HTML Parsers do Javascript to see the resulting DOM?

When trying Hpricot and Nokogiri, the HTML can be fetched and parsed, but can they also execute the Javascript as well so that the content shows on the page? (shows up in the the DOM). That's because some page won't show the info unless the Javascript interpreter has run. ...

XML schema validation and pattern errors

I have a problem validating a perfectly valid XML with it's schema file in Ruby. It works OK on my development machine (OS X 10.6) but fails everytime on the production system (Debian 4.1). The part of the XML that gives errors is this: <ROUNDINGS>-0.02</ROUNDINGS> And the XSD pattern is this: <xsd:element name="ROUNDINGS"> <xsd:s...