views:

510

answers:

2

I thought that mechanize follows redirection by default ... by my script ends at the redirection page. How can I handle this?

require 'rubygems'
require 'mechanize'

agent = WWW::Mechanize.new 

page = agent.get("http://www.vbulletin.org/forum/index.php")

login_form = page.form_with(:action => 'login.php?do=login')

login_form['vb_login_username'] = 'user name'
login_form['vb_login_password'] = ''
login_form['vb_login_md5password_utf'] = 'md5 hash from the password'
login_form['vb_login_md5password'] = 'md5 hash from the password'

page = agent.submit login_form

#Display welcome message if logged in
puts page.parser.xpath("/html/body/div/table/tr/td[2]/div/div").xpath('text()').to_s.strip

output = File.open("login.html", "w") {|f| f.write(page.parser.to_html) }

redirection page html

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"&gt;
<html xmlns="http://www.w3.org/1999/xhtml" dir="ltr" lang="en">
<head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"></head>
<body>
<noscript>

    <meta http-equiv="Refresh" content="2; URL=http://www.vbulletin.org/forum/index.php"&gt;
</noscript>



    <script type="text/javascript">

    <!--

    function exec_refresh()

    {

        window.status = "Redirecting..." + myvar;

        myvar = myvar + " .";

        var timerID = setTimeout("exec_refresh();", 100);

        if (timeout > 0)

        {

            timeout -= 1;

        }

        else

        {

            clearTimeout(timerID);

            window.status = "";

            window.location = "http://www.vbulletin.org/forum/index.php";

        }

    }



    var myvar = "";

    var timeout = 20;

    exec_refresh();

    //-->

    </script><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<meta name="generator" content="vBulletin 3.6.12">
<meta name="keywords" content="vbulletin,forum,bbs,discussion,jelsoft,bulletin board,hacks,modifications,addons,articles,programming,site review">
<meta name="description" content="This is a discussion forum powered by vBulletin. To find out about vBulletin, go to http://www.vbulletin.com/ .">
<!-- CSS Stylesheet --><link rel="stylesheet" type="text/css" href="clientscript/vbulletin_css/style-befeb917-00023.css" id="vbulletin_css">
<link rel="stylesheet" type="text/css" href="clientscript/blue.css" id="blue">
<!-- / CSS Stylesheet --><script type="text/javascript">

<!--

var SESSIONURL = "";

var SECURITYTOKEN = "c1e3de2fd54e2938c4ab1e80ae448aa6bbea25b2";

var IMGDIR_MISC = "images/misc";

var vb_disable_ajax = parseInt("0", 10);

// -->

</script><script type="text/javascript" src="clientscript/vbulletin_global.js?v=3612"></script><link rel="alternate" type="application/rss+xml" title="vBulletin.org Forum RSS Feed" href="external.php?type=RSS2">
<!-- VB.ORG --><!-- The line above sets the script var and must be left in --><script type="text/javascript"> 

<!--

script = "login";

userid = "0";  

forumid = ""; 

threadid = ""; 

authorid = ""; 

// -->

</script><script type="text/javascript" src="clientscript/vborg_miscactions.js?v=3612"></script><!-- /VB.ORG --><title>vBulletin.org Forum</title>
<div id="container" style="border-width:0;width:950px">

<br><br><br><br><form action="http://www.vbulletin.org/forum/index.php" method="post" name="postvarform">

<table class="tborder" cellpadding="4" cellspacing="1" border="0" align="center">
<tr>
<td class="tcat">Redirecting...</td>

</tr>
<tr>
<td class="panelsurround" align="center">

    <div class="panel">



        <blockquote>

            <p> </p>

            <p><strong>Thank you for logging in, my username.</strong></p>          



                <p class="smallfont"><a href="http://www.vbulletin.org/forum/index.php"&gt;Click here if your browser does not automatically redirect you.</a></p>

                <div> </div>



        </blockquote>



    </div>

    </td>

</tr>
</table>
</form>





<br><br><br><br>
</div>



</body>
</html>
+2  A: 

Are you sure it's HTTP 30x Redirect response? Not normal "200 OK" response with page containing <meta http-equiv="refresh" content="100;url=...">. Mechanize follows first type of redirection, second is just normal response with link, which you should follow.

I can't check, because you didn't posted username/password in code.

Edit:
After your update, this should be enough to follow redirection after login:

page.link_with(:text => "Click here if your browser does not automatically redirect you.").click
MBO
@MBO: how can I check what type it is? I do not have to click anything in my browser, the page refreshes itself. I added the redirection page html in my question. Can I ask how do you know what type of redirection mechanize follows?
Radek
@Radek I updated my answer
MBO
@Radek your other option would be to just have mechanize get "http://www.vbulletin.org/forum/index.php" (or rather the href of the link @MBO mentioned) after you've been sent to this page.
Tim Snowhite
that's definitely the way how to solve it ... I found something else. See my answer
Radek
+4  A: 

there is a option in mechanize for that. Mechanize does not follow meta refreshes by default, we can set that option though.

agent.follow_meta_refresh = true
Radek
+1 Nice, I didn't know about that option
MBO
niiiiccccceeeeee
Zombies