mechanize

How to add JS support to the Ruby Mechanize gem?

Hello, is there a way to add JavaScript support to mechanize so that it will handle simple redirection like "document.location.href="? TIA ...

Web scraping sites that require javascript support

Possible Duplicate: Screen Scraping from a web page with a lot of Javascript I just want to do tasks such as form entry and web scraping, but on sites that require javascript support. And I also need to enter forms, scrape, and so on in the same session. Ideally, I'd like a way to control a web browser from the command line. And...

Mechanize and BeautifulSoup for PHP?

I was wondering if there was anything similar like Mechanize or BeautifulSoup for PHP? ...

mechanize (python) click on a javascript type link

Hi, is it possible to have mechanize follow an anchor link that is of type javascript? I am trying to login into a website in python using mechanize and beautifulsoup. this is the anchor link <a id="StaticModuleID15_ctl00_SkinLogin1_Login1_Login1_LoginButton" href="javascript:WebForm_DoPostBackWithOptions(new WebForm_PostBackOptions(&...

Using Python Mechanize like "Tamper Data"

I'm writing a web testing script with python (2.6) and mechanize (0.1.11). The page I'm working with has an html form with a select field like this: <select name="field1" size="1"> <option value="A" selected>A</option> <option value="B">B</option> <option value="C">C</option> <option value="D">D</option> </select> In ...

upload file with Python Mechanize

When I run the following script: from mechanize import Browser br = Browser() br.open(url) br.select_form(name="edit_form") br['file'] = 'file.txt' br.submit() I get: ValueError: value attribute is readonly And I still get the same error when I add: br.form.set_all_readonly(False) So, how can I use Python Mechanize to interact wit...

Ignore Iconv::IllegalSequence while using Ruby WWW::Mechanize

Hello, I've encountered the Iconv::IllegalSequence error on some web pages when using mechanize lib. Is there a way to make mechanize just omit ill encoded characters and return the "cut" page? I'm aware of the related thread, but I'd rather discard some characters on the page, then re-implement encoding guessing. TIA ...

How can I scrape this frame?

If you visit this link right now, you will probably get a VBScript error. On the other hand, if you visit this link first and then the above link (in the same session), the page comes through. The way this application is set up, the first page is meant to serve as a frame in the second (main) page. If you click around a bit, you'll see...

Mechanize Iconv::IllegalSequence when trying to form POST query

Hello, the following code raises the aforementioned error, how can I fix that? require 'mechanize' m = WWW::Mechanize.new p = m.get('http://art-mobile.com.ua/register.php') f = p.forms.first f.submit(f.buttons.last) Just in case, here is the full description of an error on my box D:/ruby/lib/ruby/gems/1.9.1/gems/mechanize-0.9.3/lib/w...

Ruby mechanize post with header

I have page with js that post data via XMLHttpRequest and server side script check for this header, how to send this header? agent = WWW::Mechanize.new { |a| a.user_agent_alias = 'Mac Safari' a.log = Logger.new('./site.log') } agent.post('http://site.com/board.php', { 'act' => '_get_page', "gid" => 1, 'order' => 0, ...

How to get redirect log in Mechanize?

Hello. In ruby, if you use mechanize following 301/302 redirects like this require 'mechanize' m = WWW::Mechanize.new m.get('http://google.com') how to get the list of the pages mechanize was redirected through? (Like http://google.com => http://www.google.com => http://google.com.ua) OK, here is the code in mechanize responsible fo...

Can can I encode spaces as %20 in a POST from WWW::Mechanize?

I'm using WWW::Mechanize to do some standard website traversal, but at one point I have to construct a special POST request and send it off. All this requires session cookies. In the POST request I'm making, spaces are being encoded to + symbols, but I need them encoded as a %20. I can't figure out how to alter this behaviour. I realis...

Mechanize and Google App Engine

Has someone managed to use mechanize with Google App Engine application? ...

python and mechanize.open()

I have some code that is using mechanize and a password protected site. I can login just fine and get the results I expect. However, once I log in I don't want to "click" links I want to iterate through a list of URLs. Unfortunately each .open() call simply gets a re-direct to the login page, which is the behaviour I would expect if I ha...

How to click a link that has javascript:__doPostBack in href?

I am writing a screen scraper script in python with module 'mechanize' and I would like to use the mechanize.click_link() method on a link that has javascript:__doPostBack in href. I believe the page I am trying to parse is using AJAX. Note: mech is the mechanize.Browser() >>> next_link.__class__.__name__ 'Link' >>> next_link Link(bas...

Is there a way to test Comet applications without a running browser?

I'm trying to connect to an application that uses Comet and is pretty heavy on Javascript and Comet. I've gone as far as I can go in Firebug, HTTP Header examination and am trying to see what's coming over the wire by writing something using Ruby Mechanize. However, since I have no client run-time, my approach is to mimic the HTTP re...

Mechanize setting a field with a duplicate name...

I'm using mechanize and have a problem on one form... the form has two select boxes with the same name. How can I select the second one? ie. NumNights second occurrence. i found in the docs something like this: form.set_fields( :foo => ['bar', 1] ) but this didn't work: form.field_with(:name => [ 'NumNights', 2 ]).options[no_days...

Mechanize RuntimeError in Ruby 1.8.7

ruby mech.rb /usr/lib/ruby/gems/1.8/gems/mechanize-0.9.3/lib/www/mechanize/chain/uri_resolver.rb:53:in handle': need absolute URL (RuntimeError) from /usr/lib/ruby/gems/1.8/gems/mechanize-0.9.3/lib/www/mechanize/chain.rb:25:in handle' from /usr/lib/ruby/gems/1.8/gems/mechanize-0.9.3/lib/www/mechanize.rb:457:in fetch_page' f...

Using mechanize to visit a site that requires SSL

I need to visit a site (https://*) that requires me to install two certificates in Firefox before I can visit it successfully. One I can export as a .p12 file (Client Certificate), and one is a .crt file (CA Certificate). If I try accessing this site without these certificates, I get a "failed handshake error". How do I visit this site ...

How to search XPath inside Python ClientForm object?

Hello, I've got a form, returned by Python mechanize Browser and got via forms() method. How can I perform XPath search inside form node, that is, among descendant nodes of the HTML form node? TIA Upd: How to save html code of the form? ...