Suppose I have this HTML:
html = <div>Four score and seven years ago</div>
What's the best way to insert (say) an anchor tag after the word "score"? Note: I want to do this in terms of DOM manipulation (with Hpricot, e.g.) not in terms of text manipulation (e.g., no regexes)
...
Suppose I have the following HTML:
html = Four score and seven <b>years ago</b>
I want to parse this with Hpricot:
doc = Hpricot(html)
Find the <b> node:
node = doc.at('b')
and then get the character index of the <b> node within its parent:
node.character_index
=> 22
How can I do this (i.e., what's the real version of the cha...
Now that http://github.com/why/hpricot/wikis/home no longer exists.
...
Hey--I'm writing a basic Rails app that uses the digg API. I'm trying to parse the xml data that digg's api provides with hpricot, but when testing the page, the browser hangs until I eventually catch the Timeout::Error exception.
Here's the code for the controller:
require 'rubygems'
require 'hpricot'
require 'open-uri'
appkey = 'htt...
I'm trying to consume some legacy XML with elements like this in JRuby:
<x-doc attr="value">
<nested>
<with.dot>content</with.dot >
</nested>
</x-doc>
I've been working with Hpricot, but Hpricot's HTML-oriented shortcuts are working against me: doc.search("//with.dot") seems to be looking for <with class="dot" />
(I ran into ...
I am attempting to parse a Wiktionary entry to retrieve all english definitions. I am able to retrive all definitions, the problem is that some definitions are in other languages. What I would like to do is somehow retrieve only the HTML block with English definitions. I have found that, in the case that there are other language entri...
There are lots of examples of how to strip HTML tags from a document using Ruby, Hpricot and Nokogiri have inner_text methods that remove all HTML for you easily and quickly.
What I am trying to do is the opposite, remove all the text from an HTML document, leaving just the tags and their attributes.
I considered looping through the do...
I know I can parse XML using Hpricot, but is it also possible to create files? All the tutorials I found only demonstrate parsing.
...
I'm trying to install the hpricot gem on my Windows machine using JRuby 1.4.0RC1. I'm trying to follow the advice to the related question (see -> http://stackoverflow.com/questions/726412/installing-hpricot-for-jruby/1323619#1323619).
Per the answer's advice I pulled the git head of hpricot and from it's dir ran:
jruby -S rake package...
given:
require 'rubygems'
require 'nokogiri'
value = Nokogiri::HTML.parse(<<-HTML_END)
"<html>
<body>
<p id='para-1'>A</p>
<div class='block' id='X1'>
<h1>Foo</h1>
<p id='para-2'>B</p>
</div>
<p id='para-3'>C</p>
<h2>Bar</h2>
<p id='para-4'>D</p>
<p id='para-5'>E</p>
<div class='block' id='X2'>
<p id='para-6...
I'm having problems deciding between hpricot and scrubyt and I was wondering if someone who has worked with them could provide an advantages/disadvantages list for each.
...
I have a Twitter app that works fantastic locally - it searches for keywords then for each user it grabs their info using Hpricot to parse the xml e.g.
Hpricot(open("http://twitter.com/users/show/"+myuser+".xml"))
Works fine locally but when I go love it fails. Looking at my log I get this error:
OpenURI::HTTPError (400 Bad Request):
...
My xml:
http://www.google.ru/ig/api?weather=Chelyabinsk
<forecast_information>
<city data="Chelyabinsk, Province of Chelyabinsk"/>
</forecast_information>
How to get city data for example? Not inner_html, just attributes like city data, postal code etc.
...
I want to go through the children of an element and filter only the ones that are text or span, something like:
element.children.select {|child|
child.class == String || child.element_type == 'span'
}
but I can't find a way to test which type a certain element is. How do I test that? I'd like to know that regardless if there's a bet...
Hello,
I have the following HTML doc :
<ul>
<li><span>Some text</span></li>
<li><span>Some other text</span></li>
<li><span>Some more text</span></li>
</ul>
How can I use Hpricot to loop on the list items and insert some new HTML at the beginning of each, so that I get the following :
<ul>
<li><span>1</span><span>Some text</...
I've been playing around with HPricot, but after a fair amount of searching, I've not been able to work this out.
I'm trying to parse a HTML page and find all tags with a href to an mp3 file. So far I've got
<ul>
<% @page.search('//a[@href*=mp3]').each do |link| %>
<li>
<%= link.inner_text %>
</li...
I want to match links like <a href="mailto:[email protected]">foo</a>, but this doesn't work only works in Nokogiri:
doc/'a[href ^="mailto:"]'
What's the right way of doing that? How do I do that with Hpricot?
...
Let's say this is the location element:
<.location>blah...<./location>
It can be empty like this:
<.location/>
Is there a way to detect the backslash in the empty element in order to not return it?
...
hi, I have read a large deal of tutorials to help out and under Hpricot, the problem that i am finding out it is not scraping all the Html so to speak. I'll elaborate:
The website i am attempting to scrape html off is http://yellowpages.com.mt/Malta-Search/Radio-In-Malta-Gozo.aspx .
I require to obtain the links that are listed as resu...
Hi, I am having a problem Scraping Code i require to extract information for a Web MashUp i'm creating.
Basically, I am trying to Scrap Code from:
http://yellowpages.com.mt/Meranti-Ltd-In-Malta-Gozo;/Hair-Accessories;Hijjhkikke=Hiojhhfokje.aspx
This is just one of the pages i will need to scrape and hence i cannot feed the program d...