questions about hpricot | ansaurus

hpricot

How to insert a DOM node a a specific character index in an existing node (with Hpricot or a similar Ruby library)

Suppose I have this HTML: html = <div>Four score and seven years ago</div> What's the best way to insert (say) an anchor tag after the word "score"? Note: I want to do this in terms of DOM manipulation (with Hpricot, e.g.) not in terms of text manipulation (e.g., no regexes) ...

Find the character index of a node within its parent node with Hpricot

Suppose I have the following HTML: html = Four score and seven <b>years ago</b> I want to parse this with Hpricot: doc = Hpricot(html) Find the <b> node: node = doc.at('b') and then get the character index of the <b> node within its parent: node.character_index => 22 How can I do this (i.e., what's the real version of the cha...

Where can I find Hpricot documentation?

Now that http://github.com/why/hpricot/wikis/home no longer exists. ...

Timeout Error with Hpricot in Rails Controller

Hey--I'm writing a basic Rails app that uses the digg API. I'm trying to parse the xml data that digg's api provides with hpricot, but when testing the page, the browser hangs until I eventually catch the Timeout::Error exception. Here's the code for the controller: require 'rubygems' require 'hpricot' require 'open-uri' appkey = 'htt...

Parse XML with JRuby (Hpricot?) with tags like <foo.bar>

I'm trying to consume some legacy XML with elements like this in JRuby: <x-doc attr="value"> <nested> <with.dot>content</with.dot > </nested> </x-doc> I've been working with Hpricot, but Hpricot's HTML-oriented shortcuts are working against me: doc.search("//with.dot") seems to be looking for <with class="dot" /> (I ran into ...

Searching all elements before an h2 element in hpricot/nokogiri

I am attempting to parse a Wiktionary entry to retrieve all english definitions. I am able to retrive all definitions, the problem is that some definitions are in other languages. What I would like to do is somehow retrieve only the HTML block with English definitions. I have found that, in the case that there are other language entri...

Strip text from HTML document using Ruby

There are lots of examples of how to strip HTML tags from a document using Ruby, Hpricot and Nokogiri have inner_text methods that remove all HTML for you easily and quickly. What I am trying to do is the opposite, remove all the text from an HTML document, leaving just the tags and their attributes. I considered looping through the do...

Is it possible to create XML files using Hpricot?

I know I can parse XML using Hpricot, but is it also possible to create files? All the tutorials I found only demonstrate parsing. ...

NoClassDefFoundError on org.jruby.Main

I'm trying to install the hpricot gem on my Windows machine using JRuby 1.4.0RC1. I'm trying to follow the advice to the related question (see -> http://stackoverflow.com/questions/726412/installing-hpricot-for-jruby/1323619#1323619). Per the answer's advice I pulled the git head of hpricot and from it's dir ran: jruby -S rake package...

How to do a regex search in nokogiri

given: require 'rubygems' require 'nokogiri' value = Nokogiri::HTML.parse(<<-HTML_END) "<html> <body> <p id='para-1'>A</p> <div class='block' id='X1'> <h1>Foo</h1> <p id='para-2'>B</p> </div> <p id='para-3'>C</p> <h2>Bar</h2> <p id='para-4'>D</p> <p id='para-5'>E</p> <div class='block' id='X2'> <p id='para-6...

Hpricot or scRUBYt

I'm having problems deciding between hpricot and scrubyt and I was wondering if someone who has worked with them could provide an advantages/disadvantages list for each. ...

What does Twitter api return a 400 error in production

I have a Twitter app that works fantastic locally - it searches for keywords then for each user it grabs their info using Hpricot to parse the xml e.g. Hpricot(open("http://twitter.com/users/show/"+myuser+".xml")) Works fine locally but when I go love it fails. Looking at my log I get this error: OpenURI::HTTPError (400 Bad Request): ...

Parse XML with hpricot, get attributes.

My xml: http://www.google.ru/ig/api?weather=Chelyabinsk <forecast_information> <city data="Chelyabinsk, Province of Chelyabinsk"/> </forecast_information> How to get city data for example? Not inner_html, just attributes like city data, postal code etc. ...

Get the type of an element in Hpricot

I want to go through the children of an element and filter only the ones that are text or span, something like: element.children.select {|child| child.class == String || child.element_type == 'span' } but I can't find a way to test which type a certain element is. How do I test that? I'd like to know that regardless if there's a bet...

screen-scraping

Hpricot looping with index ?

Hello, I have the following HTML doc : <ul> <li><span>Some text</span></li> <li><span>Some other text</span></li> <li><span>Some more text</span></li> </ul> How can I use Hpricot to loop on the list items and insert some new HTML at the beginning of each, so that I get the following : <ul> <li><span>1</span><span>Some text</...

finding linked files with HPricot

I've been playing around with HPricot, but after a fair amount of searching, I've not been able to work this out. I'm trying to parse a HTML page and find all tags with a href to an mp3 file. So far I've got <ul> <% @page.search('//a[@href*=mp3]').each do |link| %> <li> <%= link.inner_text %> </li...

How to detect mailto links with Hpricot/Nokogiri

I want to match links like <a href="mailto:[email protected]">foo</a>, but this doesn't work only works in Nokogiri: doc/'a[href ^="mailto:"]' What's the right way of doing that? How do I do that with Hpricot? ...

screen-scraping

Checking emptiness of an element in hpricot

Let's say this is the location element: <.location>blah...<./location> It can be empty like this: <.location/> Is there a way to detect the backslash in the empty element in order to not return it? ...

HTML Scraping with Hpricot (Using Ruby on Rails)

hi, I have read a large deal of tutorials to help out and under Hpricot, the problem that i am finding out it is not scraping all the Html so to speak. I'll elaborate: The website i am attempting to scrape html off is http://yellowpages.com.mt/Malta-Search/Radio-In-Malta-Gozo.aspx . I require to obtain the links that are listed as resu...

screen-scraping

Html / Script Scraping Google Map using Hpricot (Ruby On Rails)

Hi, I am having a problem Scraping Code i require to extract information for a Web MashUp i'm creating. Basically, I am trying to Scrap Code from: http://yellowpages.com.mt/Meranti-Ltd-In-Malta-Gozo;/Hair-Accessories;Hijjhkikke=Hiojhhfokje.aspx This is just one of the pages i will need to scrape and hence i cannot feed the program d...

screen-scraping

1
2
3
4
5