Prevent images from downloading with ScrAPI | ansaurus

tags:

views:

39

answers:

1

Q:

Prevent images from downloading with ScrAPI

I need to scrape some websites, and would like to avoid downloading images from the pages I am scraping - I only need the text. I am hoping this will speed up the process. Any ideas on how to manage this?

Thanks, Jon

+2 A:

While scraping you do not download images but the reference IMG tag along with the entire body. You can always remove the IMG tag on the server side before storing into your database/rendering to the view. I would suggest you use nokogiri to parse the content received and remove all occurrences of the IMG tag.

This however does not speed up the process. Its just plain old html that is scraped. If you want fast fetching and parsing go for Feedzirra if you are dealing with feeds or Typhoeus for fetching just the html content.

Shripad K 2010-07-05 08:28:15

Ok, thanks for explaining that. I will have a look at Typhoeus.

CHsurfer 2010-07-05 12:01:07

related questions

Is there a rake task for backing up the data in your database?

How to represent cross-model information in MVC?

WYSIWYG editor gem for Rails?

Best Solution For Authentication in Ruby on Rails

Acts-as-readable Rails plugin Issue

Associating source and search keywords with account creation

Any tips on getting Rails to run with an Access back-end?

Being as DRY as possible in a Ruby on Rails App

How Do You Secure database.yml?

What IDE to use for developing in Ruby on Rails on windows?

Is Ruby On Rails ready for the Enterprise?

OpenID authentication in Ruby on Rails

Why Doesn't My Cron Job Work Properly?

Why does sqlite3-ruby-1.2.2 not work on OS X?

Ruby mixins and calling super methods

Why all the Active Record hate?

Haml: how do I set a dymanic class value?

Learning Ruby on Rails any good for Grails?

How to sell Python to a client/boss/person with lots of cash

How do I create a new Ruby on Rails application using MySQL instead of SQLite?

Ruby On Rails with Windows Vista - Best Setup?

What is good forum software to add to an existing Rails application?

How do I fix 'Unprocessed view path found' error with ExceptionNotifier plugin in rails 2.1?

Frequent SystemExit in Ruby when making HTTP calls

Implementation of "Remember me" in a Rails application.