Hello,
I have a quick question. I am currently writing a Nokogiri/Ruby script and have the following code:
fullId = doc.xpath("/success/data/annotatorResultBean/annotations/annotationBean/concept/fullId")
fullId.each do |e|
e = e.to_s()
g.write(e + "\n")
end
This spits out the following text:
<fullId>D00...
Example:
<fruit name="mango"/>
I want to get output as:
name="mango"
...
Does anyone know of any Ruby libraries/gems that allow you to traverse a DOM quickly?
I need something which is fast, and doesn't have a lot of dependencies. I've been trying to use Nokogiri, but I'm concerned with the number of 'bug segmentation faults' I've been getting.
...
I'm trying to extract each a href link on an html page for evaluation w/ nokogiri and xpath. What I have so far seems to be pulling the page titles out only. I'm not interested in the link title, but rather just the URL that is being pointed to.
Here's what I have:
doc = Nokogiri::HTML(open("http://www.cnn.com"))
doc.xpath('//a').each...
Hello,
I'm having trouble figuring out why I can't get keywords to parse properly through nokogiri. In the following example, I have the a href link text functionality working properly but cannot figure out how to pull the keywords.
This is the code I have thus far:
.....
doc = Nokogiri::HTML(open("http://www.cnn.com"))
doc.xpath('/...
Hi,
i have an HTML, that should be transformed, having some tags replaced with another tags.
I don't know about these tags, because they will come from db. So, "set_attribute" or "name" methods of Nokogiri are not suiteable for me
I need to do it, in a way, like in this pseudo-code:
def preprocess_content
doc = Nokogiri::HTML( self...
Say, we have an HTML, in which, all ...
<div class="replace-me">
</div>
... must be replaced with
<video src='my_video.mov'></video>
The code is following:
doc.css("div.replace-me").each do |div|
div.replace "<video src='my_video.mov'></video>"
end
It's simple, but, unfortunately, it does't work for me. Nokogiri crashes with f...
Hi,
How can I use nokogiri to split the following HTML into text nodes? I want to somehow split the content by using the <br/> tag as a delimiter or sadly an unclosed <br> which is often the case in the HTML I am scraping.
An example of the html would be:
<td>
<font size="2" face="Arial"><b>HALL (J&E) LTD</b><br>
...
Right now, splitting the HTML document to small pieces like this:
(regular expression simplified - skipping header tag content and closing tag)
document.at('body').inner_html.split(/<\s*h[2-6][^>]*>/i).collect do |fragment|
Nokogiri::HTML(fragment)
end
Is there more easy way to perform that splitting?
The document is very simple, j...
This error comes up in Redhat Enterprise Linux Server 5.4 - 64 bit.
Linux rhl-64-tibbr5 2.6.18-164.el5 #1 SMP Tue Aug 18 15:51:48 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
There is also this error in the stack trace.
uninitialized constant Nokogiri::VERSION_INFO
More version details:
jruby-1.4.0RC1
ruby/gems/1.8/gems/activesupport-2....
I need to parse for the an xml style sheet
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="/templates/xslt/inspections/disclaimer_en.xsl"?>
Using nokogiri I have tried using
doc.search("?xml-stylesheet").first['href']
but I get the error
`on_error': unexpected '?' after '' (Nokogiri::CSS::SyntaxErro...
Admittedly, I'm a Nokogiri newbie and I must be missing something...
I'm simply trying to print the author > name node out of this XML:
<?xml version="1.0" encoding="UTF-8"?>
<entry xmlns:gd="http://schemas.google.com/g/2005" xmlns:docs="http://schemas.google.com/docs/2007" xmlns="http://www.w3.org/2005/Atom" gd:etag="">
<category te...
I am parsing an XML doc that looks something like this:
<MyBook>
<title>Favorite Poems</title>
<issn>123-456</issn>
<pages>45</pages>
</MyBook>
<MyBook>
<title>Chocolate Desserts</title>
<issn>654-098</issn>
<pages>100</pages>
</MyBook>
<MyBook>
<title>Jabberwocky</title>
<issn>454-545</issn>
<pages>19</pages>...
i want to get row which it contains more than 3 columns
how to write xpath with nokogiri
require 'rubygems'
require 'nokogiri'
item='sometext'
doc = Nokogiri::HTML.parse(open(item))
data=doc.xpath('/html/body/table/tr[@td.size>3]')
puts data
it can not run , help and advices appreciated.
...
I'm trying to use Nokogiri to parse an HTML file with some fairly eccentric markup. Specifically, I'm trying to grab divs which have both ids, multiple classes and styles defined. The markup looks something like this:
<div id="foo">
<div id="bar" class="baz bang" style="display: block;">
<h2>title</h2>
<dl>
List of stu...
I'm trying to add an attribute to an existing Nokogiri node. What I've done is this:
node.attributes['foobar'] = Nokogiri::XML::Attr.new('foo', 'bar')
But I get the error:
TypeError Exception: wrong argument type String (expected Data)
What is a Data data type, and how do I add an attribute to the Nokogiri object?
Thanks!
...
I would like to collect and store all this info into an array.
I have the following, how should I refactor this?
require 'rubygems'
require 'nokogiri'
require 'open-uri'
@urls = %w{http://url_01.com http://url_02.com http://url_03.com}
@link_01_arr = []
@link_02_arr = []
@link_03_arr = []
link_01 = Nokogiri::HTML(open("#{@urls[0]}"...
Hi,
I'd like to use Nokogiri to extract all nodes in an element that contain a specific attribute name.
e.g., I'd like to find the 2 nodes that contain the attribute "blah" in the document below.
@doc = Nokogiri::HTML::DocumentFragment.parse <<-EOHTML
<body>
<h1 blah="afadf">Three's Company</h1>
<div>A love triangle.</div>
<b b...
When trying Hpricot and Nokogiri, the HTML can be fetched and parsed, but can they also execute the Javascript as well so that the content shows on the page? (shows up in the the DOM). That's because some page won't show the info unless the Javascript interpreter has run.
...
I have a problem validating a perfectly valid XML with it's schema file in Ruby. It works OK on my development machine (OS X 10.6) but fails everytime on the production system (Debian 4.1).
The part of the XML that gives errors is this:
<ROUNDINGS>-0.02</ROUNDINGS>
And the XSD pattern is this:
<xsd:element name="ROUNDINGS">
<xsd:s...