ruby + save web page

views:

369

answers:

ruby + save web page

To save the HTML of a web page using ruby, it's very easy. One way to do is by using rio :

require 'rubygems' require 'rio' rio('http://www.google.com') > rio('google.html')

It is possible to do the same for by parsing the html, requesting again the different image,s js, css and then save each of them. I think it is not very efficient. So, is there a way to save a web page + all the images, css, and javascript that are related to that page .... and all this automatically.

+1 A:

what about system("wget -r -l 1 http://google.com")

dimus 2009-09-01 01:51:04

Most time we can use the system's tools. Like dimus said, you can use the wget to download page.

And there are many useful api for solving the Net problem. Such as net/ftp, net/http or net/https. You can see the document for detail. Net/HTTP .But these methods only get the response, what we need do more is parsing the HTML document. Even more using the mozilla's lib is a good way.

Qianjigui 2009-09-01 14:38:25

As for parsing the html of a webpage, I'm using nokogiri. Howver, it only retrieves the HTML not the images, css, js...The system wget is good but I can not take control of the whole process directly from Ruby. Is there an equivalent wget in Ruby ?

massinissa 2009-09-01 22:35:52

I have written a simple web browser with the mozilla's lib in ruby.May be this is a well way to solve the problem?

Qianjigui 2009-09-02 09:08:31

http://ruby-gnome2.sourceforge.jp/hiki.cgi?RubyZilla

Qianjigui 2009-09-02 09:15:53

Thanks for the idea, I think it is possible to do the job using mozilla lib. Is there any ruby interface for it ?

massinissa 2009-09-02 15:54:23

www.youtube.com/watch?v=50a16bSJ8GU

Qianjigui 2009-09-03 16:26:03

looks great !! Is it possible to have your ruby code ? Thanks in advance

massinissa 2009-09-06 15:54:32

ansaurus

tags:

views:

answers:

ruby + save web page

related questions