I have a title doc.at('head/title').inner_html that comes out & and it should be &.
My original document is:
<head><title>Foo & Bar</title></head>
but in comes out as the following:
>> doc = Nokogiri::HTML.parse(file, nil, "UTF-8")
>> doc.at('head/title')
=> #<Nokogiri::XML::Element:0x..fdb851bea name="title" children=#<Nokogiri...
My first question here, would be awesome to find an answer. I am new to using nokogiri.
Here is my problem. I have something like this in the HTML head on a target site (here a techcrunch post):
<meta content="During my time at TechCrunch I've seen thousands of startups and written about hundreds of them. I sure as hell don't know all ...
Hi all
I have just installed ruby+mechanize. It seems to me that it is posible in ruby nokogiri what I want to do but I do not know how to do it.
What about this table? It is just part of html of vBulletin forum site. I tried to keep the html structure but deleted some text and tag attributes. I want to get some details per thread like...
what is the best method using ruby/mechanize/nokogiri to go/click through all pages in case there is more than 1 page I need to access/click on? For example here Page 1 of 34 Should I click the page number or next? Or is out there any better solution?
...
I have not found any documentation nor tutorial for that. Does anything like that exist?
doc.xpath('//table/tbody[@id="threadbits_forum_251"]/tr')
the code above will get me any table, anywhere, that has a tbody child with the attribute id equal to threadbits_forum_251. But why does it start with double //? why there is /tr at the e...
Is there a way to edit the text of a nokogiri element? I have a nokogiri element that contains a list element (<li>) and I would like to remove some characters from the text while preserving the <li> html. Specifically, I want to remove a leading ":" character in the text if it exists. It doesn't look like there's a text= method for n...
This seems like the hardest problem I have had yet, but maybe I am making it harder than it needs to be. I need to remove an unknown number of nested elements that may or may not be at the beginning of a sentence. The span elements contain a number of words in parentheses. So in the sentence:
(cryptography, slang) An internet firewa...
I'm getting a segfault in nokogiri (1.4.1) run (under cucumber 0.6.1/webrat 0.7.0/rspec 1.3.x)
response.should have_selector("div", :class => "fieldWithErrors")
and the div in the page is actually
<div class="fieldWithErrors validation_error"> stuff </div>
Everything runs fine if I just test nokogiri against a test document
>> req...
I am extracting data from a forum. My script based on is working fine. Now I need to extract date and time (21 Dec 2009, 20:39) from single post. I cannot get it work. I used FireXPath to determine the xpath.
Sample code:
require 'rubygems'
require 'mechanize'
post_agent = WWW::Mechanize.new
post_page = post_agent.get('http:/...
I know that there are dozens of ways to select the first child element in Nokogiri, but which is the cheapest?
I can't get around using Node#children, which sounds awfully expensive. Say that there are 10000 child nodes, and I don't want to touch the 9999 others...
...
I'm learning how to use nokogiri and few questions came to me based on the code below
require 'rubygems'
require 'mechanize'
post_agent = WWW::Mechanize.new
post_page = post_agent.get('http://www.vbulletin.org/forum/showthread.php?t=230708')
puts "\nabsolute path with tbody gives nil"
puts post_page.parser.xpath('/html/body/div/div/d...
In my ruby+mechanize(nokogiri) script I use this piece of code:
row.at_xpath('td[3]/div[1]/a/text()').to_s.strip
on a forum where the post title html looks like:
<a href="showthread.php?t=233891" ></body> on Footer ?</a>
and I receive from xpath this string </body> on Footer ?
I would like to get what I can see in ...
last week I started to write a script in ruby. I needed to scrape some data from the web so I was recommended to use mechanize and then nokogiri.
Mechanize documentation says Mechanize uses nokogiri to parse html. What does this mean for you? You can treat a mechanize page like an nokogiri object. After you have used Mechanize to navig...
note i made up the term horizontal depth to measure the sub-dimension of a node within a tree.
so imagine a which would have xpath something like /html/table/tbody/tr/td, and "horizontal depth" of 5
i am trying to see if there is a way to identify and select elements based on this horizontal depth.
how can i find the maximum depth ?
...
Hey Everybody,
I am trying to build a rake tasks, that fetches a product feed and adds it to my db.
task :testme => :environment do
require 'nokogiri'
require 'zlib'
require 'open-uri'
@url = "http://some_url/filename.xml.gz"
@source = open((@url), :http_basic_authentication=>[USERID, "PASSWORD"])
@gz = Zlib::GzipReader.new(@s...
(Hope this isn't a breach of etiquette: I posted this on RailsForum, but I haven't been getting much response from there recently.)
Has anyone else had problems with Mechanize not recognizing anchor tags via CSS selectors?
The HTML looks like this (snippet with white space removed for clarity):
<td class='calendarCell' align='left'>
<...
How to replace "foo" to "bar" ?
From
<h1>foo1<p>foo2<a href="foo3.com">foo4</a>foo5</p>foo6</h1>
to
<h1>bar1<p>bar2<a href="foo3.com">bar4</a>bar5</p>bar6</h1>
I want only replace tag inner content, without tag attributes.
Any ideas ?
...
Hello,
I 'll try to be as explicit as possible
I am using nokogiri to parse links from paths and rules out of a database
I have this model:
--- !ruby/object:Content
attributes:
id: "2"
name: http://www.****** try
description: try
url_base: http://www.******
scan_flv: /"file","([^<>]*flv)"\);/imu
source_site_id: "2"
con...
I'm using nokogiri to select the 'keywords' attribute like this:
puts page.parser.xpath("//meta[@name='keywords']").to_html
One of the pages I'm working with has the keywords label with a capital "K" which has motivated me to make the query case insensitive.
<meta name="keywords"> AND <meta name="Keywords">
So, my question is: W...
How would one get the contents of the 'value' attribute of a select tag, based on content of the select tag (i.e. the text wrapped by option), using Nokogiri?
For example, given the following HTML:
<select id="options" name="options">
<option value="1">First Option - 4</option>
<option value="2">Second Option - 5</option>
<op...