views:

413

answers:

4

As the title said, I have some DOM manipulation tasks. For example, I want to: - find all H1 element which have blue color. - find all text which have size 12px. - etc..

How can I do it with Rails?

Thank you.. :)

Update

I have been doing some research about extracting web page content based on this paper-> http://www.springerlink.com/index/A65708XMUR9KN9EA.pdf

The summary of the step is:

  1. get the web url which I want to be extracted (single web page)
  2. grab some elements from the web page based on some visual rules (Ex: grab all H1 which have blue color)
  3. process the elements with my algorithm
  4. save the result into my database.

-sorry for my bad english-

+1  A: 

To put it simply, you don't. Rails is a server side language; which means that you can make the DOM on it directly. Since you make the DOM on the server side, and you know what you've made, you don't need to manipulate it, you just create it differently. Manipulating the DOM only came around as a hack that we all lovingly refer to as Javascript. Javascript manipulates the DOM on the client side once it has already been generated.

If you want to manipulate the DOM on the client side then I would recommend jQuery.

N.B. Welcome to SO by the way. :) We're here to help. And if you have a specific example please feel free to post it.

Robert Massaioli
If you really want to use Ruby, you can look into Silverline and IronRuby, but JQuery is going to be the best solution
James Deville
thanks for your respond, Shhnap. and... sorry for my bad english. :)hmm..actualy, I don't want to manipulate it on client side. I want to grab it, and than save it on my database.Maybe, its like web scrapping.
andrisetiawan
A: 

To reliably sort out what color an arbitrary element on a webpage is, you would need to reverse engineer a browser (to accurately take into account stylesheets, markup hacks, broken tags, images, etc).

A far easier approach would be to embed an existing browser such as gecko into a custom application of your making.

As your spider would browse pages, it would pass them to your embedded instance of gecko where you could use getComputedStyle to pull what color an individual element happens to be.

You originally mentioned wanting to use Ruby on Rails for this project, Rails is a framework for writing presentational applications and really a bad fit for a project like this.

As a starting point, I'd recommend you check out RubyGnome, and in particular RubyGnome's Gtk::MozEmbed functionality.

Mike Buckbee
Thanks, Mike.Is it works with css property too?For example: I want to select only H1 which have blue color.
andrisetiawan
This isn't what the OP wants. He wants to do all the processing on the server-side, not in JavaScript.
musicfreak
I posted my answer prior to his update (when it did appear that he wanted a client side solution).
Mike Buckbee
+1  A: 

If what you're trying to do is manipulate HTML documents inside a rails application, you should take a look at Nokogiri.

It uses XPath to search through the document. With the following, you would find any h1 with the "blue" css class inside a document.

require 'nokogiri'
require 'open-uri'

doc = Nokogiri::HTML(open('http://www.stackoverflow.com'))
doc.xpath('//h1/a[@class="blue"]').each do |link|
    puts link.content
end

After, if what you were trying to do was indeed parse the current page dom, you should take a look at JavaScript and JQuery. Rails can't do that.

Damien MATHIEU
This was my original approach as well, but if you read through that paper synopsis, he's not asking for a css class "blue", but actually something the color blue across multiple sites with potentially wildly different CSS and markup schemes.
Mike Buckbee