How to scrape a page like this.
https://www.procom.ca/JobList.aspx?keywords=&Cities=&reference=&JobType=0
It is secure, and requires a referrer? I can't get anything using wget or httplib2.
If you go through this page, you get a list and it works on a browser but not the command line.
https://www.procom.ca/jobsearch.aspx
...
Hi folks,
I'm having a problem with my python script.
It's printing massive amounts of data on the screen, and I would like to prevent all sorts of printing to screen.
Edit:
The library I'm using is mechanize, and it's printing a LOT of data on screen.
I have set these to false with no luck!
br.set_debug_redirects(False)
br.set_d...
hi im building a scraper using python 2.5 and beautifulsoup
but im stuble upon a problem ... part of the web page is generating
after user click on some button, whitch start an ajax request by calling specific javacsript function using proper parameters
is there a way to simulate user interaction and get this result? i come across a mec...
I am trying to access a form in mechanize with ugly characters in the object name
similar to this
agent = Mechanize.new
page = agent.get('http://domain.com)
form = page.forms[0]
form.ct600$Main$LastNameTextBox = "whatever"
page = agent.submit(form)
The problem is the $ in the html name is messing with ruby.
Is there another method...
Could someone point me to some?
...
I am trying to get the response codes from Mechanize in python. While I am able to get a 200 status code anything else isn't returned (404 throws and exception and 30x is ignored). Is there a way to get the original status code?
Thanks
...
I'm trying to submit a form on an .asp page but Mechanize does not recognize the name of the control. The form code is:
<form id="form1" name="frmSearchQuick" method="post">
....
<input type="button" name="btSearchTop" value="SEARCH" class="buttonctl" onClick="uf_Browse('dledir_search_quick.asp');" >
My code is as follows:
br = mech...
I need to write a Perl script to scrape a website. The website can only be scraped with JavaScript, and the user is on Windows.
I got some way with Win32::IE::Mechanize on my work machine, which has IE6, but then I moved to my netbook which has IE8, and can't even get as far as fetching a simple page.
Is Win32::IE::Mechanize up to d...
Are there any libraries or frameworks that provide the functionality of a browser, but do not need to actually render physically onto the screen?
I want to automate navigation on web pages (Mechanize does this, for example), but I want the full browser experience, including Javascript. Thus, I'd like to have a virtual browser of some so...
I want to process all links but external ones from the whole web site. Is there any easy way how to identify that the link is external and skip it?
My code looks so far like (the site url is passed through command line argument)
I am using mechanize (0.9.3) and ruby 1.8.6 (2008-08-11 patchlevel 287) [i386-mswin32]
Please note that the...
What if I only need to download the page if it has not changed since the last download?
What is the best way? can I get the size of the page first, then compare the decide if it has changed, if so, I ask for download else skip?
I plan to use (python) mechanize.
...
I am trying to crawl a site using mechanize.
The site provides search results in different pages.
When posting to get the next set of results, something is wrong and the server redirects me to the first page, asking mechanize to update the SearchSession Cookie.
I have been debugging the requests using Firefox and they look quite the sam...
a = WWW::Mechanize.new { |agent|
agent.user_agent_alias = 'Mac Safari'
agent.history.max_size=0
}
page = a.get('http://livingsocial.com/deals?preferred_city=18')
Trying a very basic GET request using mechanize but get a 500, yet when I CURL I have no problems. Is there a problem with including parameters in a get() call? ...
Hey can someone help with the following?
I'm trying to scrape a site that has the following information.. I need to pull just the number after the </strong> tag..
[<li><strong>ISBN-13:</strong> 9780375853401</li>, <li><strong>Pub. Date: </strong> 05/11/2010</li>]
[<li><strong>UPC:</strong> 490355000372</li>, <li><strong>Catalog No:</st...
Is there a way to get around the following?
httperror_seek_wrapper: HTTP Error 403: request disallowed by robots.txt
Is the only way around this to contact the site-owner (barnesandnoble.com).. i'm building a site that would bring them more sales, not sure why they would deny access at a certain depth.
I'm using mechanize and Beautif...
I'm just starting to learn Ruby and have a problem with encoding;
require 'rubygems'
require 'mechanize'
agent = Mechanize.new
agent.get('myurl.....')
agent.page.search('#reciperesult a').each do |item|
c = Mechanize.new
c.get(item.attributes['href'])
puts c.page.search('#ingredients li').text
end
The output text are shown li...
Hi all,
Can someone show me an example of using mechanize (python version) with PFX certificates?
...
Hi Everyone,
I've been trying to use Mocha to do some stubbing for tests on code using Mechanize. Here is an example method:
def lookup_course subject_area = nil, course = nil, quarter = nil, year = nil
raise ArgumentError, "Subject Area can not be nil" if (subject_area.nil? || subject_area.empty?)
page = get_page FIND_BASIC_...
Hi,
I was wondering if there is something like Perl's/Python's mechanize for Java.
Thanks!
...
Is there a way to fill out textarea that is part of form using mechanize module for Python?
...