mechanize

How to get a http page using mechanize cookies?

Hello. There is a Python mechanize object with a form with almost all values set, but not yet submitted. Now I want to fetch another page using cookies from mechanize instance, but without resetting the page, forms and so on, e.g. so that the values remain set (I just need to get body string of another page, nothing else). So is there a ...

How to fix such ClientForm bug?

Hello, the following code from mechanize import Browser br = Browser() page = br.open('http://wow.interzet.ru/news.php?readmore=23') br.form = br.forms().next() print br.form gives me the following error: Traceback (most recent call last): File "C:\Users\roddik\Desktop\mech.py", line 6, in <module> br.form = br.forms().next() ...

How to fix encoding in Python Mechanize?

Hello, here is the sample code: from mechanize import Browser br = Browser() page = br.open('http://hunters.tclans.ru/news.php?readmore=2') br.form = br.forms().next() print br.form The problem is that server return incorrect encoding (windows-cp1251). How can I manually set the encoding of the current page in mechanize? Error: Tra...

Python mechanize doesn't click a button

Hello, check the following script: from mechanize import Browser br = Browser() page = br.open('http://scottishladiespool.com/register.php') br.select_form(nr = 5) r = br.click(type = "submit", nr = 0) print r.data #prints username=&password1=&password2=&email=&user_hide_email=1&captcha_code=&user_msn=&user_yahoo=&user_web=&user_loca...

How to set an nonexistent field in Python ClientForm?

Hello. I'm using mechanize (which uses clientform) for some web crawling in python and since it doesn't support JS, I want to set a value of an unexistent input in a form (the input is generated by JS). How can I do this? The error is similar to the one you get if you try to execute from mechanize import Browser br = Browser() page = b...

Ruby - Working with Mechanize::File response without saving to disk

Hi all, I'm working on my first ORM project and am using Mechanize. Here's the situation: I'm downloading a zip file from my website into a Mechanize::File object. Inside the zip is a file buried three folders deep (folder_1/folder_2/file.txt). I'd like to pull file.txt out of the zip file and return that instead of the zip file itse...

What pure Python library should I use to scrape a website?

I currently have some Ruby code used to scrape some websites. I was using Ruby because at the time I was using Ruby on Rails for a site, and it just made sense. Now I'm trying to port this over to Google App Engine, and keep getting stuck. I've ported Python Mechanize to work with Google App Engine, but it doesn't support DOM inspecti...

Issues with mechanize follow_link() and back()

I've run into a problem with mechanize following links. Here's a snippet of what I'm aiming to do: for link in mech.links(url_regex='/test/'): mech.follow_link(link) // Do some processing on that link mech.back() According to mechanize examples, this should work just fine. However it doesn't. Despite calling .back(), the...

Ruby Nokogiri Parsing HTML table

I am using mechanize/nokogiri and need to parse out the following HTML string. can anyone help me with the xpath syntax to do this or any other methods that would work? <table> <tr class="darkRow"> <td> <span> <a href="?x=mSOWNEBYee31H0eV-V6JA0ZejXANJXLsttVxillWOFoykMg5U65P4x7FtTbsosKRbbBPuYvV8nPhET7b5sFeON4aWpbD10Dq"> <span...

mechanize can't login python

I'm making auto-login script by use mechanize python. Before I was used mechanize with no problem, but www.gmarket.co.kr in this site I couldn't make it . whenever i try to login always login page was returned even with correct gmarket id , pass, i can't login and I saw some suspicious message "<script language=javascript>top.locatio...

Is it possible to hook up a more robust HTML parser to Python mechanize?

I am trying to parse and submit a form on a website using mechanize, but it appears that the built-in form parser cannot detect the form and its elements. I suspect that it is choking on poorly formed HTML, and I'd like to try pre-parsing it with a parser better designed to handle bad HTML (say lxml or BeautifulSoup) and then feeding the...

mechanize python click a button

I have a form with <input type="button" name="submit" /> button and would like to be able to click it. I have tried mech.form.click("submit") but that gives the following error: ControlNotFoundError: no control matching kind 'clickable', id 'submit' mech.submit() also doesn't work since its type is button and not submit. Any ideas? ...

TypeError: ListControl, must set a sequence (python error)

I am using Python Mechanize to open a website, fill out a form, and submit that form. It's actually pretty simple. It works until I come across radio buttons and "select" input boxes. br.open(url) br.select_form(name="postmsg") br.form['subject'] = "Is this good for the holidays? " br.form['message'] = "I'm new to technology." br.form['...

How to submit a form with more than 1 submit button. Sending a POST to a website. (Python)

I am creating a script using Python Mechanize that can login to a website and submit a form. However, this form has 3 submit buttons (Preview, Post, and Cancel). I'm used to only one button... This is the form: <TextControl(subject=Is this good for the holidays? Anyone know about the new tech?)> <IgnoreControl(threads=<None>)> <Tex...

Python mechanize loses attributes on second open

This is a really specialized case and I feel awkward asking it; however I'm at wits end working on it. I need to follow a tracking number through a form and to a results page so I've been using mechanize in python, the link after form submission is embedded in javascript so I can't simply follow_link. What I want to do is to regex out ...

Puzzled by ror Mechanize

I'm trying to use mechanize to perform a simple search on my college's class schedule db. The following code returns nil, however it works logging into facebook and searching google (with diff url/params). What am I doing wrong? I'm following the latest (great) railscast here. Mechanize documentation has been useful but I'm still puzzle...

Mechanize submit login form from http to https

I have a web page containing a login form which loads via HTTP, but it submits the data via HTTPS. I'm using python-mechanize to log into this site, but it seems that the data is submitted via HTTP. My code is looks like this: import mechanize b = mechanize.Browser() b.open('http://site.com') form = b.forms().next() # the login form...

Mechanize with weird https form ror

I'm using ROR trying to search a simple form at my college using mechanize. The code works fine for searching google, but returns the search form in the results? I'm really confused. Any advice? Thanks! ruby script/console require 'mechanize' agent = WWW::Mechanize.new agent.get("https://www.owens.edu/cgi-bin/class.pl/") agent.page.form...

Mechanize with FakeWeb

I'm using Mechanize to extract the links from the page. To ease with development, I'm using fakeweb to do superfast response to get less waiting and annoying with every code run. tags_url = "http://website.com/tags/" FakeWeb.register_uri(:get, tags_url, :body => "tags.txt") agent = WWW::Mechanize.new page = agent.get(tags_url) page.lin...

How to set the mechanize page encoding?

Hi, I'm trying to get a page with an ISO-8859-1 encoding clicking on a link, so the code is similar to this: page_result = page.link_with( :text => 'link_text' ).click So far I get the result with a wrong encoding, so I see characters like: 'T�tulo:' instead of 'Título:' I've tried several approaches, including: Stating the enco...