Hey everyone,
I am currently doing some XML parsing and I've chosen to use Hpricot because of it's ease of use and syntax, however I am running into some problems. I need to write a piece of XML data that I have found out to another file. However, when I do this the format is not preserved. For example, if the content should look like t...
the html like this:
"[font color="#FF0000"]test [font color="#FF0000"]Hello world[/font][/font]"
I want to replace the font tag the result like this:
test Hello world
thanks you
...
Any idea why hpricot might segfault on this page?
trial_url = 'http://www.controlled-trials.com/ISRCTN56071145/'
doc = Hpricot(open(trial_url))
produces:
/Users/ap257/.gem/ruby/1.8/gems/hpricot-0.8.2/lib/hpricot/parse.rb:33: [BUG] Segmentation fault
ruby 1.8.7 (2009-06-08 patchlevel 173) [universal-darwin10.0]
Abort trap
Please cou...
like this:
<p style="font-size: 12pt;">
Hello world
<span style="font-weight: bold;">just do it</span>
</p>
I want to remove every element's "style" attribute. I want the result like this:
<p>Hello world <span>just do it</span></p>
how to do this using hpricot?
thanks.
ok I have solved this like below:
doc = Hpricot("<p st...
I'm using Hpricot to parse an html page, but need to get the computed styles for each element. For example, if I have an h1 Hpricot element and the external CSS for the page has a background-image defined for h1's, how can I find out what the background-image is?
...
Hi there,
I'm trying to scrape a page but the initial response has nothing in the body as the content is pumped in asynchronously, e.g. the results from a search on the apple website: http://www.apple.com/uk/search/?q=searching+for+something&sec=global
Any ideas on how I can successfully grab the results from the search with hprico...
Hi,
I'm using Hpricot with selenium I have this html input element:
<input id="foo:bar"/>
And I'm trying to get this value with this Xpath expression:
source = Hpricot(@selenium.get_html_source)
source.search("//input[@id='foo:bar']")
but it is not finding anything because of the colon. I have seen that the Xpath expression cannot...
I am getting the following encoding error when trying to scrap web pages with hpricot in ruby 1.9:
Encoding::CompatibilityError: incompatible character encodings: ASCII-8BIT and UTF-8
I can reproduce the error by doing the following:
ska:~ sam$ rvm 1.9.2@hpricot
ska:~ sam$ ruby -v
ruby 1.9.2dev (2010-05-31 revision 28117) [x86_64-dar...
Hi evryone,
I'm trying to replace all my tags in an HTML file by <%= image_tag() %> rails tag.
What i want to do is something like :
doc = open("myfile.html") { |f| Hpricot(f) }
imgs = doc.search("//img") # here i got all Hpricot::Elements
imgs.each { |i|
# fake function name !
i.replace_by_HTML('<%= image_tag("/images/blab...
How do I do it? E.g.,
<span class="selected" id="hi">HELLO</span>
should become
<span id="hi">HELLO</span>
...
I have an XML that looks like this:
<data>
<image src="http://www.someweb.com/something.png"/>
</data>
What is the correct way to use hpricot to extact just 'http://www.someweb.com/something.png'? The closest I can get is this...
>>(doc/"image").first
=> {emptyelem <image src="http://www.someweb.com/something.png">}
I've re...
I need a way of extracting the main text from any webpage that displays an article. Similar to the way that Readability can find the main text on any website that it's run on.
I'm using Ruby on Rails, so I think Hpricot is my best bet. Is what I'm looking for possible in Hpricot? Is there an example somewhere? Thanks for reading.
...
Hi all:
I'm trying to get the largest image off a page I parse with Hpricot and am not having any luck. How do I access the width and height attributes of an img tag with it?
Thanks...Chris
...
Can someone please tell me how to convert this line of Javascript to Ruby using Hpricot & RegEx?
// Replace all doubled-up <BR> tags with <P> tags, and remove fonts.
var pattern = new RegExp ("<br/?>[ \r\n\s]*<br/?>", "g");
document.body.innerHTML = document.body.innerHTML.replace(pattern, "</p><p>").replace(/<\/?font[^>]*>/g, ...
Can someone please explain how to add a custom attribute to an HTML tag using Ruby with Hpricot gem?
I have a tag that looks like this:
<div class="test" id="tag1" style="">
and I want to add a custom integer attribute called 'Readable=0' so it looks like this:
<div class="test" id="tag1" style="" readable=0>
Is this possible?
...
i just want the text out of there with out those tags. Does Hrpicot.XML have any methods for this?
...
I am completely new to Ruby and Rails... in fact I created my first application in Rails today that makes an HTTP request to pull back an XML document and then outputs it to the screen.. something simple to get started..
Well I am needing to now parse the XML string but am lost on how to do that exactly with Hpricot.
Here is my code so...
I am trying to write a Rails app that takes an XML object and then iterates over the object showing the user the information contained inside the different nodes.
I am completely new to Rails, coming from a PHP background and am having some trouble with a particular function.
I need to basically say, if this node exists die, if it does...
When trying Hpricot and Nokogiri, the HTML can be fetched and parsed, but can they also execute the Javascript as well so that the content shows on the page? (shows up in the the DOM). That's because some page won't show the info unless the Javascript interpreter has run.
...
I'd like to slurp the following data about historical inventions into a convenient Ruby data structure:
http://yootles.com/outbox/inventions.xml
Note that all the data is in the XML attributes.
It seems like there should be a quick solution with a couple lines of code.
With Rails there'd be Hash.from_xml though I'm not sure that would...