nokogiri

Nokogiri (RubyGem): Find and replace HTML tags

I have the following HTML: <html> <body> <h1>Foo</h1> <p>The quick brown fox.</p> <h1>Bar</h1> <p>Jumps over the lazy dog.</p> </body> </html> ...and by using the RubyGem Nokogiri (a hpricot replacement), I'd like to change it into the following HTML: <html> <body> <p class="title">Foo</p> <p>The quick brown fox.</p> <p class="title"...

Nokogiri: Putting a group of <p> inside a <div>

I'd like to figure out a way on how to get to the HTML result (mentioned further below) by using the following Ruby code and the Nokogiri Rubygem: require 'rubygems' require 'nokogiri' value = Nokogiri::HTML.parse(<<-HTML_END) "<html> <body> <p id='1'>A</p> <p id='2'>B</p> <h1>Bla</h1> <p id='3'>C</p> ...

Nokogiri: Searching for <div> using XPath.

I use Nokogiri (Rubygem) css search to look for certain <div> inside my html. It looks like Nokogiri's css search doesn't like regex. I would like to switch to Nokogiri's xpath search as this seems to support regex in search strings. How do I implement the (pseudo) css search mentioned below in an xpath search? require 'rubygems' requi...

Nokogiri: Navigating the DOM.

I'm trying to fill the variables parent_element_h1 and parent_element_h2. Can anyone help me use the Nokogiri Gem to get the information I need into those variables? require 'rubygems' require 'nokogiri' value = Nokogiri::HTML.parse(<<-HTML_END) "<html> <body> <p id='para-1'>A</p> <div class='block' id='X1'> ...

How do I get an installed ruby gem included in rails?

I am attempting to get a gem I've just installed working in a rails application. I can require the gem just fine in a ruby program that I run from the command line using: require 'nokogiri' But when I attempt to do the same in one of my rails controllers it errors saying "no such file to load -- nokogiri". I tried using the full pat...

Nokogiri: Select content between element A and B

What's the smartest way to have Nokogiri select all content between the start and the stop element (including start-/stop-element)? Check example code below to understand what I'm looking for: require 'rubygems' require 'nokogiri' value = Nokogiri::HTML.parse(<<-HTML_END) "<html> <body> <p id='para-1'>A</p> <div clas...

Nokogiri: Sort Array of IDs according to order in HTML document.

I have an unsorted Array holding the following IDs: @un_array = ['bar', 'para-3', 'para-2', 'para-7'] Is there a smart way of using Nokogiri (or plain Javascript) to sort the array according to the order of the IDs in the example HTML document below? require 'rubygems' require 'nokogiri' value = Nokogiri::HTML.parse(<<-HTML_END) ...

How do I get Nokogiri to understand my namespaces?

I have the following XML document: <samlp:LogoutRequest ID="123456789" Version="2.0" IssueInstant="200904051217"> <saml:NameID>@NOT_USED@</saml:NameID> <samlp:SessionIndex>abcdefg</samlp:SessionIndex> </samlp:LogoutRequest> I'd like to get the content of the SessionIndex (that is, 'abcdefg') out of it. I've tried this: XPATH_QUE...

extract links (URLs), with nokogiri in ruby, from a href html tags?

hi I want to extract from a webpage all URLs how can I do that with nokogiri? example: <div class="heat"> <a href='http://example.org/site/1/'&gt;site 1</a> <a href='http://example.org/site/2/'&gt;site 2</a> <a href='http://example.org/site/3/'&gt;site 3</a> </diV> result should be an list: l = ['http://example.org/site/1...

Can I get html elements with nokogiri?

Hi, I have a doubt about nokogiri, I need to get the HTML elements from a page, and get the xpath for each one. The problem is that I can't realize how to do it with nokogiri. The HTML code is random, because I've to parse several pages, from different websites. ...

Nokogiri: How can I add a child to a node at a specific position?

Hello all, I have a node which has two children: an XML text and an XML element. <h1 id="Installation-blahblah">Installation on server<a href="#Installation-blah" class="wiki-anchor">&para;</a> In this case the XML text is: Installation on server and the XML element: <a href="#Installation-blah" class="wiki-anchor">anchor;</...

Nokogiri scrubs style and script tags in after/before

Hi, I'm trying to add a bunch of html to an existing nodeset, at the top. It mostly works, but the style tags and script tags are getting scrubbed of their content. Here's what I mean: doc.xpath("//head/*[1]").before("<script>var xb=25</script>") But if I try to display this, this is what I get: hdoc.xpath("//head/*[1]") => <script><...

How to loop through an table and turn rows into objects using nokogiri

I want to use nokogiri to loop through a html and create an object corresponding to every row. I am able to define the root xpaths where I want the data to fill the object varibles comes from but I dont know how to group these as an object. My code is below. I know it doesn't work but I dont know what direction to go to make it work. ...

How to parse xml files with nokogiri and put the results in a new file?

Hello all, I'm just beginning with Nokogiri and have a question, hope you guys can help me out: 1) I need to parse a set of xml files (let's say 5 files). 2) Find elements with specific value (for instance, City = "London"), with XPATH. 3) Have a new xml file, with the results of the previous xpath parsing. ...

The Most Basic Nokogiri Program Fails -- Documentation Problem or Bug?

I decided to give Nokogiri a try, and copied the following program straight from http://nokogiri.rubyforge.org/nokogiri/Nokogiri.html (adding only the require 'rubygems' and the I_KNOW_I_AM_USING_AN_OLD_AND_BUGGY_VERSION_OF_LIBXML2 constant): require 'rubygems' I_KNOW_I_AM_USING_AN_OLD_AND_BUGGY_VERSION_OF_LIBXML2 = 1 require 'nokogiri'...

Nokogiri oddness?

A sample of some oddness: #!/usr/bin/ruby require 'rubygems' require 'open-uri' require 'nokogiri' print "without read: ", Nokogiri(open('http://weblog.rubyonrails.org/')).class, "\n" print "with read: ", Nokogiri(open('http://weblog.rubyonrails.org/').read).class, "\n" Running this returns: without read: Nokogiri::XML::Document...

Using ruby and nokogiri to parsing HTML using HTML comments as markers

How could I use ruby to extract information from a table consisting of these rows? Is it possible to detect the comments using nokogiri?   EXTRACT LINK 1 EXTRACT DESCRIPTION EXTRACT LINK 2 Mr P 1 ...

Using ruby and nokogiri to select ahrefs based on part of the URL

I have a document containing ahref links I want to extract. The link I want can be identified by part of the url they link to. There are other links that are similar which I want to discard. The urls of the links I want are of the format http://www.xxxxxxxxxxxxxxxxxxx.com/index.php?showtopic=44&amp;hl= I want to search for links con...

Convert a Nokogiri document to a Ruby Hash

Is there an easy way to convert a Nokogiri XML document to a Hash? Something like Rails' Hash.from_xml. ...

Find tag with id including [] with Nokogiri

I have an html element like: <div id="spam[500]"> I want to search for this element by id, but it seems that nokogiri is getting confused by the []. I'm trying: doc.css("#spam[#{eggs.id}]") but to no avail. ...