views:

32

answers:

2

Hi, I'm trying to connect to a website (source code below) that requires login and then browse it to download some files. I've managed to do this for another website using this code:

public void initConnection(String _path, Map<String,String> _parameters) throws IOException {

    String data = convertMapToParams(_parameters);

    // Send data
    URL url = new URL(host + "/" + _path);
    URLConnection conn = url.openConnection();
    conn.setDoOutput(true);

    OutputStreamWriter wr = new OutputStreamWriter(conn.getOutputStream());
    wr.write(data);
    wr.flush();
    wr.close();

    sessionCookie = conn.getHeaderField("Set-Cookie");
    sessionCookie = sessionCookie.substring(0,sessionCookie.indexOf(";"));
}
public List<String> getHtml(String _path, Map<String, String> _parameters) throws IOException {

    String data = convertMapToParams(_parameters);

    URL url = new URL(host + "/" + _path);
    HttpURLConnection conn = (HttpURLConnection) url.openConnection();
    conn.setDoOutput(true);

    conn.setRequestProperty("Cookie", sessionCookie);

    OutputStreamWriter wr = new OutputStreamWriter(conn.getOutputStream());
    wr.write(data);
    wr.flush();
    wr.close();

    List<String> list = new LinkedList<String>();

    BufferedReader rd = new BufferedReader(new InputStreamReader(conn.getInputStream()));
    String line;
    while ((line = rd.readLine()) != null) {
        list.add(line);
    }

    rd.close();

    return list;
}

The problem is that on this website, when I do this:

sessionCookie = conn.getHeaderField("Set-Cookie");

I get sessionCookie == "null", so I am not able to get any cookies to keep the session opened. And if I get the headers from the conn variable to check if there is any cookie field in there I get this (from IntelliJ IDEA debugger):

[0] = {java.util.Collections$UnmodifiableMap$UnmodifiableEntrySet$UnmodifiableEntry@2085}"null=[HTTP/1.1 200 OK]"
[1] = {java.util.Collections$UnmodifiableMap$UnmodifiableEntrySet$UnmodifiableEntry@2093}"X-AspNet-Version=[2.0.50727]"
[2] = {java.util.Collections$UnmodifiableMap$UnmodifiableEntrySet$UnmodifiableEntry@2102}"Date=[Wed, 18 Aug 2010 07:32:37 GMT]"
[3] = {java.util.Collections$UnmodifiableMap$UnmodifiableEntrySet$UnmodifiableEntry@2111}"Content-Length=[3686]"
[4] = {java.util.Collections$UnmodifiableMap$UnmodifiableEntrySet$UnmodifiableEntry@2120}"Content-Type=[text/html; charset=utf-8]"
[5] = {java.util.Collections$UnmodifiableMap$UnmodifiableEntrySet$UnmodifiableEntry@2129}"Server=[Microsoft-IIS/6.0]"
[6] = {java.util.Collections$UnmodifiableMap$UnmodifiableEntrySet$UnmodifiableEntry@2138}"X-Powered-By=[ASP.NET]"
[7] = {java.util.Collections$UnmodifiableMap$UnmodifiableEntrySet$UnmodifiableEntry@2147}"Cache-Control=[private]"

But using Firefox add-on "HttpFox" to check if there are cookies, I discovered that there are:

(Request-Line)  POST /companias/entrada.aspx HTTP/1.1
User-Agent  Mozilla/5.0 (Windows; U; Windows NT 6.1; es-ES; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8
Accept  text/html,application/xhtml+xml,application/xml;q=0.9,*/;q=0.8
Accept-Language es-es,es;q=0.8,en-us;q=0.5,en;q=0.3
Accept-Encoding gzip,deflate
Accept-Charset  ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive  115
Connection  keep-alive
Cookie  __utma=235757843.1141928071.1280949246.1282083861.1282114987.11; __utmz=235757843.1280949246.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); __utmc=235757843
Content-Type    application/x-www-form-urlencoded
Content-Length  381

Another thing that confused me were these fields in the sourcecode "_VIEWSTATE", "_EVENTVALIDATION", "_EVENTTARGET","_LASTFOCUS" and "__EVENTARGUMENT". Because I've been searching for information about them and If I understood it right, you can use VIEWSTATE to control the session of the user but I don't know how it works.

So, to put it short, on another website I used that simple "getheaderField("Set-Cookie")" to get the cookie and keep the session alive, but now I don't know if the website uses cookies or if it doesn't and also I don't know if cookies would be the way to go or if instead I have to use this VIEWSTATE field to do so.

I am not very experienced with Java yet and less with connection things, I was recommended here to use Apache HttpClient for this things and I'm reading about it, but I have so many mixed things right now that I'd first need to know the way to go with this site.

And finally, this is part of the source code from this website:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"&gt;

<html xmlns="http://www.w3.org/1999/xhtml" >

    <head><title>
    Steps Peritaciones S.L.
</title><link href="../Styles/general.css" rel="stylesheet" type="text/css" />
        <style type="text/css">
        </style>
    </head>

    <body>
        <form name="form1" method="post" action="entrada.aspx" onsubmit="javascript:return WebForm_OnSubmit();" id="form1">

<div>
<input type="hidden" name="__LASTFOCUS" id="__LASTFOCUS" value="" />
<input type="hidden" name="__EVENTTARGET" id="__EVENTTARGET" value="" />
<input type="hidden" name="__EVENTARGUMENT" id="__EVENTARGUMENT" value="" />
<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="/wEPDwUJODcxMzI1MDYzZBgBBR5fX0NvbnRyb2xzUmVxdWlyZVBvc3RCYWNrS2V5X18WAQUbTG9naW5TdGVwcyRMb2dpbkltYWdlQnV0dG9udl7bDlN22j9J5Z5UXZi+FLbU6hk=" />
</div>

<script type="text/javascript">
//<![CDATA[
var theForm = document.forms['form1'];
if (!theForm) {
    theForm = document.form1;
}
function __doPostBack(eventTarget, eventArgument) {
    if (!theForm.onsubmit || (theForm.onsubmit() != false)) {
        theForm.__EVENTTARGET.value = eventTarget;
        theForm.__EVENTARGUMENT.value = eventArgument;
        theForm.submit();
    }
}
//]]>
</script>

<div>

    <input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION" value="/wEWBAKZ/NOFAgK6jd26DgKovcvMBwKV8YLlBGCk0AytR6jZVZxOJwJ59H/uIN21" />
</div>
            <div class="logoEntrada">
                <img src="../images/logo_steps_p.gif" alt="Steps Peritaciones S.L." />
            </div>
            <div class="LoginForm" >
                <br />
                <br />
                <span id="Label1" class="TitolEntrada">Acceso Compañias</span>

                <br />
            </div>
            <div class="LoginForm">
                <center>
                    <table class="LoginBox" cellspacing="0" cellpadding="4" border="0" id="LoginSteps" style="background-color:#E3EAEB;border-color:#E6E2D8;border-width:1px;border-style:Solid;border-collapse:collapse;">
    <tr>
        <td><table cellpadding="0" border="0" style="color:#333333;font-family:Verdana;font-size:1em;width:234px;">
            <tr>
                <td align="center" style="color:White;background-color:#1C5E55;font-size:1em;font-weight:bold;">Entrada</td>

            </tr><tr>
                <td><label for="LoginSteps_UserName">Usuario:</label></td>
            </tr><tr>
                <td><input name="LoginSteps$UserName" type="text" id="LoginSteps_UserName" style="font-size:1em;width:171px;" /><span id="LoginSteps_UserNameRequired" title="El nombre de usuario es obligatorio." style="color:Red;visibility:hidden;">*</span></td>
            </tr><tr>
                <td><label for="LoginSteps_Password">Contraseña:</label></td>
            </tr><tr>

                <td><input name="LoginSteps$Password" type="password" id="LoginSteps_Password" style="font-size:1em;width:171px;" /><span id="LoginSteps_PasswordRequired" title="La contraseña es obligatoria." style="color:Red;visibility:hidden;">*</span></td>
            </tr><tr>
                <td align="right"><input type="submit" name="LoginSteps$LoginButton" value="Entrar" onclick="javascript:WebForm_DoPostBackWithOptions(new WebForm_PostBackOptions(&quot;LoginSteps$LoginButton&quot;, &quot;&quot;, true, &quot;LoginSteps&quot;, &quot;&quot;, false, false))" id="LoginSteps_LoginButton" style="color:#1C5E55;background-color:White;border-color:#C5BBAF;border-width:1px;border-style:Solid;font-family:Verdana;font-size:1em;" /></td>
            </tr>
        </table></td>
    </tr>
</table>
                </center>
            </div>



<script type="text/javascript">
//<![CDATA[
var LoginSteps_UserNameRequired = document.all ? document.all["LoginSteps_UserNameRequired"] : document.getElementById("LoginSteps_UserNameRequired");
LoginSteps_UserNameRequired.controltovalidate = "LoginSteps_UserName";
LoginSteps_UserNameRequired.errormessage = "El nombre de usuario es obligatorio.";
LoginSteps_UserNameRequired.validationGroup = "LoginSteps";
LoginSteps_UserNameRequired.evaluationfunction = "RequiredFieldValidatorEvaluateIsValid";
LoginSteps_UserNameRequired.initialvalue = "";
var LoginSteps_PasswordRequired = document.all ? document.all["LoginSteps_PasswordRequired"] : document.getElementById("LoginSteps_PasswordRequired");
LoginSteps_PasswordRequired.controltovalidate = "LoginSteps_Password";
LoginSteps_PasswordRequired.errormessage = "La contraseña es obligatoria.";
LoginSteps_PasswordRequired.validationGroup = "LoginSteps";
LoginSteps_PasswordRequired.evaluationfunction = "RequiredFieldValidatorEvaluateIsValid";
LoginSteps_PasswordRequired.initialvalue = "";
//]]>
</script>


<script type="text/javascript">
//<![CDATA[

var Page_ValidationActive = false;
if (typeof(ValidatorOnLoad) == "function") {
    ValidatorOnLoad();
}

function ValidatorOnSubmit() {
    if (Page_ValidationActive) {
        return ValidatorCommonOnSubmit();
    }
    else {
        return true;
    }
}
        WebForm_AutoFocus('LoginSteps');Sys.Application.initialize();

document.getElementById('LoginSteps_UserNameRequired').dispose = function() {
    Array.remove(Page_Validators, document.getElementById('LoginSteps_UserNameRequired'));
}

document.getElementById('LoginSteps_PasswordRequired').dispose = function() {
    Array.remove(Page_Validators, document.getElementById('LoginSteps_PasswordRequired'));
}
//]]>

Thanks and I hope it's not too much code in one post :S

P.D.: This websites belong to my job and I have authorized access to them, so it's not any hacking thing, I just want to automate the process and learn while I'm on it

+1  A: 

OMG, you do it all by hand ? I would really suggest you instead use HtmlUnit, as it allows you to use a virtual web client, with all its capabilities, and a higher level API allowing you to focus on website interaction, instead of opening streams by hand.

Riduidel
-1 because this isn't really an answer. It's one thing to say "the problem is x, but I recommend you do it y". It's another to say "omg that's too hard, I just do y".
PP
PARTICULARLY because this question was very specific. That is, the session/cookie handling. Your answer completely failed to address this specific question.
PP
Yes, for sure, except that session/cookie handling is a tough thing to do by hand, that HtmlClient precisely handles with ease.
Riduidel
Ops I cannot mark 2 answers as correct but although the tutorial will help me with managing cookies by hand, I think this HtmlUnit looks quite interesting. Maybe if I knew about this before I would not have gone "the hard way" :) Thanks for the info!
oli206
+2  A: 

Alternatively you can use HttpClient

Here is tutorial for same:

http://hc.apache.org/httpcomponents-client-4.0.1/tutorial/html/

Check following related to cookies (state management):

http://hc.apache.org/httpcomponents-client-4.0.1/tutorial/html/statemgmt.html

YoK
Thanks, actually I did read almost all the point 1 of the tutorial and totally missed the cookies part!
oli206