nokogiri

How do you know when to use an XML parser and when to use ActiveResource?

I tried using ActiveResource to parse a web service that was more like a HTML document and I kept getting a 404 error. Do I need to use an XML parser for this task instead of ActiveResource? My guess is that ActiveResource is only useful if you are consuming data from another Rails app and the XML data is easily translatable to a Rail...

inserting and deleting nokogiri XML nodes and elements

my basic aim is to extract parts of an XML file and make a note that i extracted some part in that file (like "here something was extracted". trying around a lot with nokogiri now, it seems like not really documented on how to 1) delete all childs of a <Nokogiri::XML::Element> 2) change the inner_text of that complete element any clu...

How do I validate XHTML with nokogiri?

I've found a few posts alluding to the fact that you can validate XHTML against its DTD using the nokogiri gem. Whilst I've managed to use it to parse XHTML successfully (looking for 'a' tags etc.), I'm struggling to validate documents. For me, this: doc = Nokogiri::XML(Net::HTTP.get(URI.parse("http://www.w3.org"))) puts doc.validate ...

Transitioning from Scrubyt to Nokogiri- Write to XML or Hash?

I'm trying to transition this bit of code from scrubyt to nokogiri, and am stuck trying to write my results to either a hash or xml. In scrubyt it looks like the following: require 'rubygems' require 'scrubyt' result_data = Scrubyt::Extractor.define do fetch "http://rads.stackoverflow.com/amzn/click/0061673730" results "//d...

libxml-ruby parsing HELP...

Alright, switching from working Hpricot to Libxml-ruby due to speed and well the disappearance of _why, looked at Nokogiri for a second but decided to look at Libxml-ruby for speed and longevity. I must be missing something basic but what im trying to do isn't working, here's my XML string file =<<XML <?xml version="1.0" encoding="utf-8...

Adding nodes with namespaces to an XML file with Nokogiri

Hello all, I'm having trouble editing an XML file. I'm currently trying to use Nokogiri, but I'm open to any other Ruby library to solve this problem. I'm trying to add a Node set inside another node set. Both have some interesting namespacing. Here's the code. I'm trying to add the new_node to the parent right after the first <p:sp> ...

[Ruby] open-uri + hpricot & nokogiri don't parse html correctly

I'm trying to parse a webpage using open-uri + hpricot but it seems to be a problem in the parsing proccess as the gems don't bring me the things I want. Specifically I want to get this div (whose id is 'pasajes') in this url: http://www.despegar.com.ar I write this code: require 'nokogiri' require 'hpricot' require 'open-uri' docu...

nokogiri xpath expressions not parsing

I am using Nokogiri 1.3.3 with Ruby 1.8.7 I am trying to match the content of a tag as described in this SO question: nodeset.xpath("entry/index[. = '#{index.to_s}']/../categories") Nokogiri raises an exception complaining about the '.' after the bracket. When I replace the '.' with text() it then complains about the second period. ...

How to handle escaped characters in XPath expressions for Nokogiri

I'm using nokogiri with an xml document that looks something like this: <songs> <song> <artist>Juana Molina</artist> <album>Un Dia</album> <track>8</track> <title>Dar (Qu&#233; Dif&#237;cil)</title> <rating>5</rating> <filename>\Juana Molina\Un Dia\08 - Juana Molina - Dar (Qu&#233; Dif&#237;cil).mp3</filename> ...

HTML Entity problems using Nokogiri::XML.fragment

Hi everybody, it seems that all entities are killed using tags = "<p>test umlauts &ouml;</p>" Nokogiri::XML.fragment(tags) Result: <p>test umlauts </p> The above method calls Nokogiri::XML::DocumentFragment.parse(tags) and that methods calls Nokogiri::XML::DocumentFragment.new(XML::Document.new, tags). In relation to the nokogir...

How to use rake to insert/replace html section in each file?

I'm using rake to create a Table of contents from a bunch of static HTML files. The question is how do I insert it into all files from within rake? I have a <ul id="toc"> in each file to aim for. The entire content of that I want to replace. I was thinking about using Nokogiri or similar to parse the document and replace the DOM node...

Optimizing ActiveRecord Point-in-Polygon Search

Hello stackies. The following PiP search was built for a project that lets users find their NYC governmental districts by address or lat/lng (http://staging.placeanddisplaced.org). It works, but its kinda slow, especially when searching through districts that have complex polygons. Can anyone give me some pointers on optimizing this code...

Getting elements in the order they appear in the document

I have a document and want to extract a couple of elements which ar direct descendents of the parent element but leave out others. The problem is that I don't get the elements in the order they appear in the document. The reason might actually be that the CSS selector I am using is wrong... require 'rubygems' require 'nokogiri' require ...

How to use nokogiri from Jruby on Windows?

Hello, I'm getting the following error when trying to use Nokogiri with Jruby on Windows 7 D:\code\h4>jruby -e "require 'rubygems'; require 'nokogiri'" D:/jruby-1.3.1/bin/../lib/ruby/1.8/ffi/library.rb:18:in `ffi_lib': Could not ope n any of [xml2, xslt, exslt] (LoadError) from D:/jruby-1.3.1/lib/ruby/gems/1.8/gems/nokogiri-1.3....

Get element text from xml doc

Hi, I'm trying to extract some information from an online xml weather resource (weather underground). I am able to open the resource and pull out the desired elements, but what I really want is to return the element text as a variable, without the containing xml element tags, so I can manipulate it and display it on a web page. Perhap...

Searching all elements before an h2 element in hpricot/nokogiri

I am attempting to parse a Wiktionary entry to retrieve all english definitions. I am able to retrive all definitions, the problem is that some definitions are in other languages. What I would like to do is somehow retrieve only the HTML block with English definitions. I have found that, in the case that there are other language entri...

how to explode <br> <br/> <br /> tags in a string?

i have a string with bunch of break tags. unfortunately they are irregular. <Br> <BR> <br/> <BR/> <br /> etc... i am using nokogiri, but i dont know how to tell it to break up the string at each break tag.... thanks. ...

Nokogiri: How to select nodes by matching text?

If I have a bunch of elements like: <p>A paragraph <ul><li>Item 1</li><li>Apple</li><li>Orange</li></ul></p> Is there a built in nokogiri method that would get me all, for example, p elements that contain the text "Apple"? (the example element above would match, for instance). ...

read_timeout in Nokogiri?

Hi, I'm fetching some weather data from an online xml doc using Nokogiri, and I would like to set up a timeout for graceful recovery in case the source can't be reached... My google searches show several possible methods for open-uri and Net::HTTP, but none specific to Nokogiri. My attempts to use those methods are failing (not too sur...

Nokogiri: how to search for certain element, and output the full traverse ?

using nokogiri, i want to find <p class="main"> Some text here...</p> from an html document, and then output the location as below or something that shows the tree html > body > div class = "body" > p class= "main " ...