mechanize

Scraping a page from a secure URL which is possibly using a session ID

How to scrape a page like this. https://www.procom.ca/JobList.aspx?keywords=&Cities=&reference=&JobType=0 It is secure, and requires a referrer? I can't get anything using wget or httplib2. If you go through this page, you get a list and it works on a browser but not the command line. https://www.procom.ca/jobsearch.aspx ...

Silence loggers and printing to screen - Python

Hi folks, I'm having a problem with my python script. It's printing massive amounts of data on the screen, and I would like to prevent all sorts of printing to screen. Edit: The library I'm using is mechanize, and it's printing a LOT of data on screen. I have set these to false with no luck! br.set_debug_redirects(False) br.set_d...

beautifulsoup and mechanize to get ajax call result

hi im building a scraper using python 2.5 and beautifulsoup but im stuble upon a problem ... part of the web page is generating after user click on some button, whitch start an ajax request by calling specific javacsript function using proper parameters is there a way to simulate user interaction and get this result? i come across a mec...

Mechanize complex form input name

I am trying to access a form in mechanize with ugly characters in the object name similar to this agent = Mechanize.new page = agent.get('http://domain.com) form = page.forms[0] form.ct600$Main$LastNameTextBox = "whatever" page = agent.submit(form) The problem is the $ in the html name is messing with ruby. Is there another method...

Are there any alternatives to Mechanize in Python?

Could someone point me to some? ...

Getting and trapping HTTP response using Mechanize in Python

I am trying to get the response codes from Mechanize in python. While I am able to get a 200 status code anything else isn't returned (404 throws and exception and 30x is ignored). Is there a way to get the original status code? Thanks ...

Using Python and Mechanize with ASP Forms

I'm trying to submit a form on an .asp page but Mechanize does not recognize the name of the control. The form code is: <form id="form1" name="frmSearchQuick" method="post"> .... <input type="button" name="btSearchTop" value="SEARCH" class="buttonctl" onClick="uf_Browse('dledir_search_quick.asp');" > My code is as follows: br = mech...

How can I use Perl to scrape a website that reveals its content with JavaScript?

I need to write a Perl script to scrape a website. The website can only be scraped with JavaScript, and the user is on Windows. I got some way with Win32::IE::Mechanize on my work machine, which has IE6, but then I moved to my netbook which has IE8, and can't even get as far as fetching a simple page. Is Win32::IE::Mechanize up to d...

Javascript (and HTML rendering) engine without a GUI for automation?

Are there any libraries or frameworks that provide the functionality of a browser, but do not need to actually render physically onto the screen? I want to automate navigation on web pages (Mechanize does this, for example), but I want the full browser experience, including Javascript. Thus, I'd like to have a virtual browser of some so...

process all links but external ones (ruby + mechanize)

I want to process all links but external ones from the whole web site. Is there any easy way how to identify that the link is external and skip it? My code looks so far like (the site url is passed through command line argument) I am using mechanize (0.9.3) and ruby 1.8.6 (2008-08-11 patchlevel 287) [i386-mswin32] Please note that the...

Can we only get the web page header information and not the body? (Mechanize)

What if I only need to download the page if it has not changed since the last download? What is the best way? can I get the size of the page first, then compare the decide if it has changed, if so, I ask for download else skip? I plan to use (python) mechanize. ...

Python Mechanize unable to avoid redirect when Post

I am trying to crawl a site using mechanize. The site provides search results in different pages. When posting to get the next set of results, something is wrong and the server redirects me to the first page, asking mechanize to update the SearchSession Cookie. I have been debugging the requests using Firefox and they look quite the sam...

Ruby Mechanize - Basic Get Failing

a = WWW::Mechanize.new { |agent| agent.user_agent_alias = 'Mac Safari' agent.history.max_size=0 } page = a.get('http://livingsocial.com/deals?preferred_city=18') Trying a very basic GET request using mechanize but get a 500, yet when I CURL I have no problems. Is there a problem with including parameters in a get() call? ...

grabbing a substring while scraping with Python2.6

Hey can someone help with the following? I'm trying to scrape a site that has the following information.. I need to pull just the number after the </strong> tag.. [<li><strong>ISBN-13:</strong> 9780375853401</li>, <li><strong>Pub. Date: </strong> 05/11/2010</li>] [<li><strong>UPC:</strong> 490355000372</li>, <li><strong>Catalog No:</st...

Screen scraping: getting around "HTTP Error 403: request disallowed by robots.txt"

Is there a way to get around the following? httperror_seek_wrapper: HTTP Error 403: request disallowed by robots.txt Is the only way around this to contact the site-owner (barnesandnoble.com).. i'm building a site that would bring them more sales, not sure why they would deny access at a certain depth. I'm using mechanize and Beautif...

Ruby encoding problem

I'm just starting to learn Ruby and have a problem with encoding; require 'rubygems' require 'mechanize' agent = Mechanize.new agent.get('myurl.....') agent.page.search('#reciperesult a').each do |item| c = Mechanize.new c.get(item.attributes['href']) puts c.page.search('#ingredients li').text end The output text are shown li...

mechanize with pfx certificate

Hi all, Can someone show me an example of using mechanize (python version) with PFX certificates? ...

Stubbing tests when using Ruby Mechanize

Hi Everyone, I've been trying to use Mocha to do some stubbing for tests on code using Mechanize. Here is an example method: def lookup_course subject_area = nil, course = nil, quarter = nil, year = nil raise ArgumentError, "Subject Area can not be nil" if (subject_area.nil? || subject_area.empty?) page = get_page FIND_BASIC_...

mechanize for Java

Hi, I was wondering if there is something like Perl's/Python's mechanize for Java. Thanks! ...

Filling textarea with Python mechanize module

Is there a way to fill out textarea that is part of form using mechanize module for Python? ...