questions about nokogiri | ansaurus

nokogiri

ruby string encoding

So, I'm trying to do some screen scraping off of a certain site using nokogiri, but the site owners failed to specify the proper encoding of the page in a <meta> tag. The upshot of this is that I'm trying to deal with strings that think they're utf-8, but really aren't. (If you care, here are the files I was using to test this: main ...

character-encoding

Merging HTML files

I want to merge one HTML file into another. Not just include it, but merge. Example master.html: <!DOCTYPE html> <html> <head> <title>My cat</title> </head> <body> <h1>My cat is awesome!</h1> </body> </html> _index.html: <!DOCTYPE html> <html> <body> <p><img src="cat.jpg"/></p> </body> </html> Now I merge ...

RVM 1.9.1 & nokogiri

Having trouble installing the nokogiri gem under rvm ruby 1.9.1. gem install nokogiri I'm getting ... /usr/include/libxml2... no libxml2 is missing. try 'port install libxml2' or 'yum install libxml2-devel' *** extconf.rb failed *** but i checked: sudo apt-get install libxml2 and i got: Reading state information... Done libxm...

Convert YAML to XML in Ruby?

Are there any scripts out there, or have any of you built a tool, to convert YAML to XML using Nokogiri? If not, any suggestions or samples? ...

Can Nokogiri use a SAX parser to parse an HTML fragment?

I have this code. class MyParser < Nokogiri::XML::SAX::Document def characters(string) LOG.debug("characters #{string}") end def start_element(name, attrs = []) LOG.debug("start_element #{name}") end def end_element(name) LOG.debug("end_element #{name}") end end parser = Nokogiri::HTML::SAX::Parser.new(MyParse...

How can unwanted tags be removed from HTML using Nokogiri?

I need to strip out all font tags from a document. When attempting to do so with the following Ruby code, other elements and text within the font tags are lost. I've also attempted to iterate through all children elements and make them siblings of the font tag before unlinking the font tag--which also results in lost HTML. What is a g...

Nokogiri Error: undefined method `radiobutton_with' - Why?

Hello! I try to access a form using mechanize (Ruby). On my form I have a gorup of Radiobuttons. So I want to check one of them. I wrote: target_form.radiobutton_with(:name => "radiobuttonname")[2].check In this line I want to check the radiobutton with the value of 2. But in this line, I get an error: : undefined method `radiobutto...

RUBY Nokogiri CSS HTML Parsing

I'm having some problems trying to get the code below to output the data in the format that I want. What I'm after is the following: CCC1-$5.00 CCC1-$10.00 CCC1-$15.00 CCC2-$7.00 where $7 belongs to CCC2 and the others to CCC1, but I can only manage to get the data in this format: CCC1-$5.00 CCC1-$10.00 CCC1-$15.00 ...

How to get Nokogiri to ignore HTML elements that doesn't exist

any idea how i can get the code below to produce this output? 1 - 2 - B i'm getting this error "undefined method `text' for nil:NilClass (NoMethodError)", because i think table 1 does not have the element 'td class=r2' in it. require 'rubygems' require 'nokogiri' require 'open-uri' doc = Nokogiri::HTML.parse(<<-eohtml) <table cla...

Parsing data without HTML tags

Hi, I need to extract the actual phone number form the html listed below, but I'm not really sure how to do it using Nokogiri CSS since there are no html tags around it. When an at_css(.phonetitle) it only parse Phone and not the number. <div class="detail"> <span class="address">Corner of Toorak Road and Chapel Street, South Yarra...

rake wont create XML file

I'm a bit lost here as to why my rake task will not create the desired XML file, however it works fine when I have the method build_xml in an .rb file. require 'rubygems' require 'nokogiri' require 'open-uri' namespace :xml do desc "xml build test" task :xml_build => :environment do build_xml end end def build_xml #...

nokogiri xml unescape

hi, i'm just trying out nokogiri xml builder, but am having some problem tying to unescape the content. have been spending a bit of time googgling but so far can't find the answer. any help would be greatly appreciated. #build xml docoument builder = Nokogiri::XML::Builder.new do |xml| xml.root{ xml.node { xml.v...

Rails - strip xml import from whitespace and line break

Hey folks, I am stuck with something quite simple but really annoying: I have an xml file with one node, where the content includes line breaks and whitspaces. Sadly I can't change the xml. <?xml version="1.0" encoding="utf-8" ?> <ProductFeed> ACME Ltd. Fooproduct Foo Root :: Bar Category I get to the nod...

Contents of a node in Nokogiri

Is there a way to select all the contents of a node in Nokogiri? <root> <element>this is <hi>the content</hi> of my æøå element</element> </root> The result of getting the content of /root/element should be this is <hi>the content</hi> of my æøå element Edit: It seems like the solution is simply to use myElement.inner_html(). Th...

How Do I Select for Multiple Classes Using Nokogiri and Ruby

From a table element, I would like to select all rows that have the class even or the class odd. I tried the jQuery syntax: report.css("table.data tr[class~=odd even]").each{|line| parse_line_item(line)} but it threw an error, any help is appreciated, thanks. ...

What would cause native gem extensions on OS X to build but fail to load?

I am having trouble with some of my rubygems, in particular those that use native extensions. I am on a MacBookPro, with Snow Leopard. I have XCode 3.2.1 installed, with gcc 4.2.1. Ruby 1.8.6, because I'm lazy and a scaredy cat and don't want to upgrade yet. Ruby is running in 32-bit mode. I built this ruby from scratch when my MBP ran ...

How to find an element by name using Nokogiri?

<input type="Checkbox" checked="" name="new"> if I have the above html in a document, how would I find it by searching for its name attribute? Edit 1: Clarified that I was looking for a solution using Nokogiri ...

How do I debug a Net::HTTPInternalServerError error when using Mechanize?

c:/ruby/lib/ruby/gems/1.8/gems/mechanize-1.0.0/lib/mechanize.rb:259:in `get': 500 => Net::HTTPInternalServerError (Mechanize::ResponseCodeError) I get the above error when I try to navigate to the following webpage http://fakewebsite.com//admin/edit_building.cfm?page=buildings&updateMode=yes&id=1251 I can navigate just fine ...

How do I print out the cookies that Mechanized has stored?

I'm using mechanize to login into a website and then retrieve a page. I'm running into some problems and I suspect this is due to some values in the cookies. When Mechanize logs into a website I assume it stores the cookies. How do I print out all the data stored in the cookies by Mechanize? ...

How to make Nokogiri transparently return un/encoded Html entities untouched?

How can I use Nokogiri with having html entities (like German umlauts) untouched? I.e.: # this is fine node = Nokogiri::HTML.fragment('<p>ö</p>') node.to_s # => '<p>ö</p>' # this is not node = Nokogiri::HTML.fragment('<p>ö</p>') node.to_s # => '<p>ö</p>' # this is what I need node = Nokogiri::HTML.fragment('<p>ö</p>') ...

1
...
5
6
7
8
9
...
13