nokogiri

Nokogiri: Merge neighbour text nodes recursively?

I have a prepared Nokogiri page where junk is removed... but still the text parts are stored in different nodes... What I want to do is connecting all direct neighbour text nodes into one single text node... what I came up with: #merge neighbour text nodes -> connect content def merge_text_nodes(node) previoustext = false node.chi...

/usr/local/lib/libz.1.dylib, file was built for i386 which is not the architecture being linked (x86_64)

Hi, having this problem on installing several things on my mac, i think this problem is coming from upgrading my leopard to snow leopard. Also this problem also is linked with macports i think. /usr/local/lib/libz.1.dylib, file was built for i386 which is not the architecture being linked (x86_64) Any ideas? Update To be more speci...

webrat + nokogiri + css selectors + whitespaces = Nightmare

I need to test with Cucumber/Webrat the presence of this button: <%=submit_tag 'Get it'%> But when I use this custom step: And I should see a button with a value of "Get it" that is: Then /^I should see a button with a value of "([^\"]*)"$/ do |value| response.should have_selector("form input[value=#{value}]") end I get: ...

passing configure options to rake gems:build

On the server (where I am not root), I have compiled libxslt into /home/foo/sw. So I can install my gem like so: gem install nokogiri -- --with-xslt-dir=/home/foo/sw However, this same technique doesn't work with rake: $ rake gems:build -- --with-xslt-dir=/home/foo/sw (in /home/foo/fooapp/releases/20100915071151) If I try to forc...

Insert &nbsp; in Rails with Nokogiri

Hi, i need to insert nbsp symbol in some places of the HTML, that comes from DB and will be displayed on the page. I do following: doc = Nokogiri::HTML( self.content ) doc.css("p").each do |p| p.content.gsub! pattern, "&nbsp;" end This resulting text contains nbsp, displayed as a plain text, but not a special symbol. I also trie...

How can Nokogiri extract the Charset encoding of a scraped HTML document?

Found a snippet that works for HTML Simple Dom Parser. $el=$html->find('meta[http-equiv=Content-Type]',0); $fullvalue = $el->content; preg_match('/charset=(.+)/', $fullvalue, $matches); echo $matches[1]; Can somebody help me to convert this so that this suits for Ruby and Nokogiri? ...

has_attribute? problem

I have an HTML document and I need to examine, whether some attribute is presented in element in question Suppose, that the attribute is not presented. When i say: elem.has_attribute? "data-attr" it returns nil instead of "false". When i say: elem["data-attr"].nil? it returns "true", that is what i need. But, when i say: !elem...

How to fail gracefully and get notified if screen scraping fails in ruby on rails

I am working on a Rails 3 project that relies heavily on screen scraping to collect data mainly using Nokogiri. I'm aggregating essentially all the same data but I'm grabbing it from many difference sources and as time goes on I will be adding more and more. However I am acutely aware that screen scraping can be notoriously unreliable....

How can I create a nokogiri case insensitive text * search?

Currnetly I am doing words = [] words << "philip morris" words << "Philip morris" words << "philip Morris" words << "Philip Morris" for word in words doc.search("[text()*='#{word}']") end When I was using hpricot I found where to downcase the results within the gem so I could just keep all my searchs lowercase, however nokogiri has ...

Using xpath with HTML or XML fragment in Nokogiri

I am new to Nokogiri and xpath and I am trying to access all comments in a HTML or XML fragment. The xpath ".//comment()" and "//comment()" works when I am not using the fragment function, but it does not find anything with a fragment. With a tag instead of a comment, it works with the first xpath. By trial and error, I realized that in...

XML to hash table in Ruby: Parsing list of historical inventions.

I'd like to slurp the following data about historical inventions into a convenient Ruby data structure: http://yootles.com/outbox/inventions.xml Note that all the data is in the XML attributes. It seems like there should be a quick solution with a couple lines of code. With Rails there'd be Hash.from_xml though I'm not sure that would...

Help needed with screen scraping using anemone and nokogiri

I have a starting page of http://www.example.com/startpage which has 1220 listings broken up by pagination in the standard way eg 20 results per page. I have code working that parses the first page of results and follows links that contain "example_guide/paris_shops" in their url. I then use Nokogiri to pull specific data of that final ...

Ruby + Nokogiri: Expand all class="..." attributes to style="..."

Hi. I'm parsing forum threads with Nokogiri and putting them on RSS feed (forum itself doesn't have RSS or any other kind of news feeds), the problem I've encountered is following: elements are styled with CSS classes and via selectors in forums style file included in the page, I can't include it into news feed so I want to replace all...

Nokogiri::XML not creating xml document

Alright, so the ultimate goal here is to parse the data inside of an xml response. The response comes in the format of a ruby string. The problem is that I'm getting an error when creating the xml file from that string (I know for a fact that response.body.to_s is a valid string of xml: <?xml version="1.0" encoding="UTF-8"?> <Response...

How to edit docx with nokogiri and rubyzip

I'm using a combination of rubyzip and nokogiri to edit a .docx file. I'm using rubyzip to unzip the .docx file and then using nokogiri to parse and change the body of the word/document.xml file but ever time I close rubyzip at the end it corrupts the file and I can't open it or repair it. I unzip the .docx file on desktop and check th...

Nokogiri and XML Formatting When Inserting Tags

I'd like to use Nokogiri to insert nodes into an XML document. Nokogiri uses the Nokogiri::XML::Builder class to insert or create new XML. If I create XML using the new method, I'm able to create nice, formatted XML: builder = Nokogiri::XML::Builder.new do |xml| xml.product { xml.test "hi" } end puts builder outputs the foll...

grabbing text between two elements in nokogiri?

<body> <div>some text</div> I NEED THIS TEXT ONLY <div>some text</div> more text here <div>some text</div> one more text here <div>some text</div> </body> How? ...

undefined method `xpath' for nil:NilClass (NoMethodError)

I'm getting the following error, seemingly randomly when trying to extract a href links from a nokogiri doc. Related code: nokohtml = page.doc nokohtml.xpath('//a/@href').each do |node| ...

Nokogiri, different results of xpath in JRuby

Hi, I am getting different results from the same xpath expression in nokogiri when using ruby and jruby, In ruby the following xpath expression returns a node while in jruby it returns a nodeset: parent = node.xpath("./ancestor::node()[name(.) = 'div' or name(.) = 'p'][1]") Has anybody else noticed similar behaviour? Thanks Paul ...

Parsing simple XML with Nokogiri

Let's say I have the following XML : <links> <item> <title>Title 1</title> <url>http://www.example.com/url-1&lt;/url&gt; </item> <item> <title>Title 2</title> <url>http://www.example.com/url-2&lt;/url&gt; </item> <item> <title>Title 3</title> <url>http://www.example.com/url-3&lt;/url&gt; </item> </l...