tags:

views:

369

answers:

2

To save the HTML of a web page using ruby, it's very easy. One way to do is by using rio :

require 'rubygems' require 'rio' rio('http://www.google.com') > rio('google.html')

It is possible to do the same for by parsing the html, requesting again the different image,s js, css and then save each of them. I think it is not very efficient. So, is there a way to save a web page + all the images, css, and javascript that are related to that page .... and all this automatically.

+1  A: 

what about system("wget -r -l 1 http://google.com")

dimus
A: 

Most time we can use the system's tools. Like dimus said, you can use the wget to download page.

And there are many useful api for solving the Net problem. Such as net/ftp, net/http or net/https. You can see the document for detail. Net/HTTP .But these methods only get the response, what we need do more is parsing the HTML document. Even more using the mozilla's lib is a good way.

Qianjigui
As for parsing the html of a webpage, I'm using nokogiri. Howver, it only retrieves the HTML not the images, css, js...The system wget is good but I can not take control of the whole process directly from Ruby. Is there an equivalent wget in Ruby ?
massinissa
I have written a simple web browser with the mozilla's lib in ruby.May be this is a well way to solve the problem?
Qianjigui
http://ruby-gnome2.sourceforge.jp/hiki.cgi?RubyZilla
Qianjigui
Thanks for the idea, I think it is possible to do the job using mozilla lib. Is there any ruby interface for it ?
massinissa
www.youtube.com/watch?v=50a16bSJ8GU
Qianjigui
looks great !! Is it possible to have your ruby code ? Thanks in advance
massinissa