I have the following that I retreive the title of each url from an array that contains a list of urls.
require 'rubygems'
require 'nokogiri'
require 'open-uri'
@urls = ["http://google.com", "http://yahoo.com", "http://rubyonrails.org"]
@found_titles = Array.new
@found_titles[0] = Nokogiri::HTML(open("#{@urls[0]}")).search("title").inn...
The following code returns an error:
require 'nokogiri'
require 'open-uri'
@doc = Nokogiri::HTML(open("http://www.amt.qc.ca/train/deux-montagnes/deux-montagnes.aspx"))
#@doc = Nokogiri::HTML(File.open("deux-montagnes.html"))
stations = @doc.xpath("//area")
stations.each { |station| str = station
reg = /href="(.*)" title="(.*)"/
...
How does Mechanize::CookieJar differ from the Mechanize::Cookies array? There must be some difference but after poking around for a little bit I can't seem to find a good explanation?
...
I'm using Nokogiri to grab the contents of the title tag on a webpage, but am having trouble with accented characters. What's the best way to deal with these? Here's what I'm doing:
require 'open-uri'
require 'nokogiri'
doc = Nokogiri::HTML(open(link))
title = doc.at_css("title")
At this point, the title looks like this:
Rag\30...
About three hours ago I started seeing the above error in my production server. It comes from a call to the sanitize gem:
vendor/rails/activerecord/lib/../../activesupport/lib/active_support/dependencies.rb:276:in 'load_missing_constant'
vendor/rails/activerecord/lib/../../activesupport/lib/active_support/dependencies.rb:468:in `const_m...
Using this xslt file found on this blog to pretty print xml using Nokogiri, everything almost works, but to the point where I can't use it for HTML.
First, if a node is empty, it turns it into a self closing node, so:
<textarea></textarea>
gets converted to
<textarea/>
But that messes up the html tree when rendered.
Second, if th...
Hpricot(html).inner_text.gsub("\r"," ").gsub("\n"," ").split(" ").join(" ")
hpricot = Hpricot(html)
hpricot.search("script").remove
hpricot.search("link").remove
hpricot.search("meta").remove
hpricot.search("style").remove
found it on http://www.savedmyday.com/2008/04/25/how-to-extract-text-from-html-using-rubyhpricot/
...
I am trying to scrape a wiktionary entry:
uri = URI.parse("http://en.wiktionary.org/wiki/" + CGI.escape('abjure'))
doc = Nokogiri::HTML(open(uri, 'User-Agent' => 'ruby'))
but the doc shows no elements for this word. The other words work fine and this word used to work. I have no idea what changed. Anyone see anything wrong with thi...
i have a problem:
Firefox adds <tbody> whether it's there or not, after <table>. I have no problem with this.
Nokogiri doesn't add it.
I need Nokogiri to emulate Firefox's behavior.
How can i add tbody after <table> elements to a given HTML page ? If tbody is already there, then move on to the next <table>....until all <tbody> tags a...
I am trying to install nokogiri locally on dreamhost using the commands:
$ wget ftp://xmlsoft.org/libxml2/libxml2-2.7.6.tar.gz
$ wget ftp://xmlsoft.org/libxml2/libxslt-1.1.26.tar.gz
$ tar zxvf libxml2-2.7.6.tar.gz
$ cd libxml2-2.7.6
$ ./configure --prefix=$HOME/local/ --exec-prefix=$HOME/local
$ make && make install
$ cd ..
$ tar zxvf l...
i want to make sure all table's immediate child is tbody....
how can i write this with xpath or nokogiri ?
doc.search("//table/").each do |j|
new_parent = Nokogiri::XML::Node.new('tbody',doc)
j.replace new_parent
new_parent << j
end
...
Hi,
I Have a HTML document with links links, for exemple:
<html>
<body>
<ul>
<li><a href="http://someurl.com/etc/etc">teste1</a></li>
<li><a href="http://someurl.com/etc/etc">teste2</a></li>
<li><a href="http://someurl.com/etc/etc">teste3</a></li>
<ul>
</body>
</html...
how can i do this ? i need to place tbody after table tags, basically to emulate Firefox's behavior.
i done this:
nodes = @doc.css "table > *"
wrapper = nodes.wrap("<tbody></tbody>")
Thanks.
...
doc.xpath("//tbody").remove
removes tbody's children ! i only want to remove all tags from the document !
how can i achieve this ?
...
I'm stuck with this problem.
cat ~/.rvm/gems/ruby-1.8.7-p249/gems/nokogiri-1.4.1/ext/nokogiri/mkmf.log
Gives this errors (clipped)
conftest.c:3: error: 'xmlParseDoc' undeclared (first use in this function)
conftest.c:3: error: (Each undeclared identifier is reported only once
conftest.c:3: error: for each function it appears in.)
F...
I have a huge XML(>400MB) containing products. Using a DOM parser is therefore excluded, so i tried to parse and process it using a pull parser. Below is a snippet from the each_product(&block) method where i iterate over the product list.
Basically, using a stack, i transform each <product> ... </product> node into a hash and process ...
The latest MacRuby release notes (v0.6) state that the authors have managed to get this release working with the SQLite and Nokogiri gems. However when I run sudo macgem install nokogiri I get the following errors:
ERROR: Error installing nokogiri:
extconf failed:
and then a bunch of paths followed by:
libxml2 is missing. try 'por...
I want to replace the inner_text in all paragraphs in my XHTML document.
I know I can get all text with Nokogiri like this
doc.xpath("//text()")
But I want only operate on text in paragraphs, how I can select all text in paragraphs without affecting eventually existent anchor texts in links ?
#For example : <p>some text <a href="/">...
Is there a method to follow a link using Nokogiri for scraping? I know I can extract the href and open it, but I thought I saw a method to do this using hpricot and was wondering if there was something like that in Nokogiri.
...
I having trouble scraping a certain long dash that is encoded as ; on the Time magazine site. It looks like this: —. It works fine when this dash is encoded as &mdash, but when the problem dash is scraped, it is returned as unknown characters. I am using Nokogiri and am wondering if I have to use some sort of special encoding? The p...