So, I'm trying to do some screen scraping off of a certain site using nokogiri, but the site owners failed to specify the proper encoding of the page in a <meta> tag. The upshot of this is that I'm trying to deal with strings that think they're utf-8, but really aren't.
(If you care, here are the files I was using to test this:
main ...
I want to merge one HTML file into another. Not just include it, but merge.
Example
master.html:
<!DOCTYPE html>
<html>
<head>
<title>My cat</title>
</head>
<body>
<h1>My cat is awesome!</h1>
</body>
</html>
_index.html:
<!DOCTYPE html>
<html>
<body>
<p><img src="cat.jpg"/></p>
</body>
</html>
Now I merge ...
Having trouble installing the nokogiri gem under rvm ruby 1.9.1.
gem install nokogiri
I'm getting ...
/usr/include/libxml2... no
libxml2 is missing. try 'port install libxml2' or 'yum install libxml2-devel'
*** extconf.rb failed ***
but i checked:
sudo apt-get install libxml2
and i got:
Reading state information... Done
libxm...
Are there any scripts out there, or have any of you built a tool, to convert YAML to XML using Nokogiri? If not, any suggestions or samples?
...
I have this code.
class MyParser < Nokogiri::XML::SAX::Document
def characters(string)
LOG.debug("characters #{string}")
end
def start_element(name, attrs = [])
LOG.debug("start_element #{name}")
end
def end_element(name)
LOG.debug("end_element #{name}")
end
end
parser = Nokogiri::HTML::SAX::Parser.new(MyParse...
I need to strip out all font tags from a document. When attempting to do so with the following Ruby code, other elements and text within the font tags are lost. I've also attempted to iterate through all children elements and make them siblings of the font tag before unlinking the font tag--which also results in lost HTML. What is a g...
Hello!
I try to access a form using mechanize (Ruby).
On my form I have a gorup of Radiobuttons.
So I want to check one of them.
I wrote:
target_form.radiobutton_with(:name => "radiobuttonname")[2].check
In this line I want to check the radiobutton with the value of 2.
But in this line, I get an error:
: undefined method `radiobutto...
I'm having some problems trying to get the code below to output the data in the format that I want. What I'm after is the following:
CCC1-$5.00
CCC1-$10.00
CCC1-$15.00
CCC2-$7.00
where $7 belongs to CCC2 and the others to CCC1, but I can only manage to get the data in this format:
CCC1-$5.00
CCC1-$10.00
CCC1-$15.00
...
any idea how i can get the code below to produce this output?
1 -
2 - B
i'm getting this error "undefined method `text' for nil:NilClass (NoMethodError)", because i think table 1 does not have the element 'td class=r2' in it.
require 'rubygems'
require 'nokogiri'
require 'open-uri'
doc = Nokogiri::HTML.parse(<<-eohtml)
<table cla...
Hi,
I need to extract the actual phone number form the html listed below, but I'm not really sure how to do it using Nokogiri CSS since there are no html tags around it. When an at_css(.phonetitle) it only parse Phone and not the number.
<div class="detail">
<span class="address">Corner of Toorak Road and Chapel Street, South Yarra...
I'm a bit lost here as to why my rake task will not create the desired XML file, however it works fine when I have the method build_xml in an .rb file.
require 'rubygems'
require 'nokogiri'
require 'open-uri'
namespace :xml do
desc "xml build test"
task :xml_build => :environment do
build_xml
end
end
def build_xml
#...
hi,
i'm just trying out nokogiri xml builder, but am having some problem tying to unescape the content. have been spending a bit of time googgling but so far can't find the answer.
any help would be greatly appreciated.
#build xml docoument
builder = Nokogiri::XML::Builder.new do |xml|
xml.root{
xml.node {
xml.v...
Hey folks,
I am stuck with something quite simple but really annoying:
I have an xml file with one node, where the content includes line breaks and whitspaces.
Sadly I can't change the xml.
<?xml version="1.0" encoding="utf-8" ?>
<ProductFeed>
ACME Ltd.
Fooproduct
Foo Root :: Bar Category
I get to the nod...
Is there a way to select all the contents of a node in Nokogiri?
<root>
<element>this is <hi>the content</hi> of my æøå element</element>
</root>
The result of getting the content of /root/element should be this is <hi>the content</hi> of my æøå element
Edit:
It seems like the solution is simply to use myElement.inner_html(). Th...
From a table element, I would like to select all rows that have the class even or the class odd.
I tried the jQuery syntax:
report.css("table.data tr[class~=odd even]").each{|line| parse_line_item(line)}
but it threw an error, any help is appreciated, thanks.
...
I am having trouble with some of my rubygems, in particular those that use native extensions.
I am on a MacBookPro, with Snow Leopard. I have XCode 3.2.1 installed, with gcc 4.2.1. Ruby 1.8.6, because I'm lazy and a scaredy cat and don't want to upgrade yet. Ruby is running in 32-bit mode. I built this ruby from scratch when my MBP ran ...
<input type="Checkbox" checked="" name="new">
if I have the above html in a document, how would I find it by searching for its name attribute?
Edit 1: Clarified that I was looking for a solution using Nokogiri
...
c:/ruby/lib/ruby/gems/1.8/gems/mechanize-1.0.0/lib/mechanize.rb:259:in `get': 500 => Net::HTTPInternalServerError (Mechanize::ResponseCodeError)
I get the above error when I try to navigate to the following webpage
http://fakewebsite.com//admin/edit_building.cfm?page=buildings&updateMode=yes&id=1251
I can navigate just fine ...
I'm using mechanize to login into a website and then retrieve a page. I'm running into some problems and I suspect this is due to some values in the cookies. When Mechanize logs into a website I assume it stores the cookies.
How do I print out all the data stored in the cookies by Mechanize?
...
How can I use Nokogiri with having html entities (like German umlauts) untouched?
I.e.:
# this is fine
node = Nokogiri::HTML.fragment('<p>ö</p>')
node.to_s # => '<p>ö</p>'
# this is not
node = Nokogiri::HTML.fragment('<p>ö</p>')
node.to_s # => '<p>ö</p>'
# this is what I need
node = Nokogiri::HTML.fragment('<p>ö</p>')
...