I have been given a staff list which is supposed to be up to date but it doesn't match an intranet People Finder which is written in ASP.NET.
As the information is sensitive I am not able to access the database the People Finder is using so the only way I can get at the information is by scraping the structure starting at the top brass at the top and then going through each tier in turn.
Each person has a Staff number which then forms the URL http://intranet/peoplefinder/index.aspx?srn=ABC1234
and then all the people who report to them are listed underneth in the format <a id="gvEmployees_ctl03_lnkFullName" href="index.aspx?srn=ABC4321" target="_self">
where each URL indicates the Staff number and provides a link to their team.
The trouble arises when the teams are big as paging is implemented in the GridView with an URL such as <a href="javascript:__doPostBack('gvEmployees','Page$2')">2</a>
.
How would I scrape this page, capture the SRN and other details along with the people who report to the person on all pages of the GridView then loop through each reportee and do the same process until the whole list is complete?
Example HTML of result
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" >
<head><title>
People Finder: Name Surname
</title><link rel="stylesheet" href="/path/to/style.css" type="text/css" /><link rel="stylesheet" href="/path/to/anotherStyle.css" type="text/css" />
<script type="text/javascript" src="/path/to/peoplefinder.js"></script>
</head>
<body>
<form name="form1" method="post" action="/path/to/index.aspx" id="form1">
<div>
<input type="hidden" name="__EVENTTARGET" id="__EVENTTARGET" value="" />
<input type="hidden" name="__EVENTARGUMENT" id="__EVENTARGUMENT" value="" />
<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="### ViewState ###" />
</div>
<script type="text/javascript">
<!--
var theForm = document.forms['form1'];
if (!theForm) {
theForm = document.form1;
}
function __doPostBack(eventTarget, eventArgument) {
if (!theForm.onsubmit || (theForm.onsubmit() != false)) {
theForm.__EVENTTARGET.value = eventTarget;
theForm.__EVENTARGUMENT.value = eventArgument;
theForm.submit();
}
}
// -->
</script>
<script src="/path/to/WebResource.axd?d=AueXWrgAf8xSxMTAt1Q4AA2&t=633311832634916698" type="text/javascript"></script>
<div class="HP3CHeader">
<div id="LWHPBanner">
<h1><span id="lblName">Name Surname</span></h1>
</div>
</div>
<div id='CPMain'>
<div id="mainBox">
<div id="pnlEmployeeDetails">
<div id='basicData'>
<img id="imgPhoto" class="photo" src="/path/to/photo.jpg" style="height:69px;width:69px;border-width:0px;" />
<span id="lblBusinessUnit">Business Unit</span>
<span id="lblCostCentreName">Cost Centre</span>
<span id="lblLocation">Location</span>
<a href='/path/to/checkcontactdetails.htm' target='_blank' onclick='return OpenCheckContactDetails();' >Find out how to change your details/photo.</a>
<div id="manager">
<strong>Reports to: </strong><a id="hlManager" href="/path/to/index.aspx?srn=ABC1234">Name Surname</a>
</div>
</div>
<div id='contactData'>
<div id="pnlSrn">
<strong>Staff number:</strong> <span id="lblSrn">ABC1234</span>
</div>
<div id="pnlEmailAddress">
<strong>Email Address:</strong> <span id="lblEmailAddress">Email</span>
</div>
<div style="clear: both"></div>
</div>
</div>
<div id="pnlGrid">
<h3><span id="lblGridTitle">Name's team</span></h3>
<div>
<table class="subordinates" cellspacing="0" cellpadding="2" rules="cols" border="1" id="gvEmployees" style="border-style:None;border-collapse:collapse;">
<tr style="color:Black;background-color:#EFF3FB;border-style:None;font-weight:bold;">
<th scope="col"><a href="javascript:__doPostBack('gvEmployees','Sort$SRN')" style="color:Black;">SRN</a></th><th scope="col"><a href="javascript:__doPostBack('gvEmployees','Sort$FullName')" style="color:Black;">Full name</a></th><th scope="col"><a href="javascript:__doPostBack('gvEmployees','Sort$RACFID')" style="color:Black;">RACFID</a></th>
</tr><tr class="reports" style="background-color:White;border-style:None;">
<td style="width:70px;">ABC1234</td><td>
<a id="gvEmployees_ctl02_lnkFullName" href="index.aspx?srn=1K5932" target="_self">Name Surname</a>
</td><td>ABCD</td>
</tr><tr class="reports" style="background-color:#EFF3FB;border-style:None;">
<td style="width:70px;">ABC1234</td><td>
<a id="gvEmployees_ctl03_lnkFullName" href="/path/to/index.aspx?srn=ABC1234" target="_self">Name Surname</a>
</td><td>ABCD</td>
</tr><tr class="reports" style="background-color:White;border-style:None;">
<td style="width:70px;">ABC1234</td><td>
<a id="gvEmployees_ctl04_lnkFullName" href="/path/to/index.aspx?srn=ABC1234" target="_self">Name Surname</a>
</td><td>ABCD</td>
</tr><tr class="reports" style="background-color:#EFF3FB;border-style:None;">
<td style="width:70px;">ABC1234</td><td>
<a id="gvEmployees_ctl05_lnkFullName" href="/path/to/index.aspx?srn=ABC1234" target="_self">Name Surname</a>
</td><td>ABCD</td>
</tr><tr class="reports" style="background-color:White;border-style:None;">
<td style="width:70px;">ABC1234</td><td>
<a id="gvEmployees_ctl06_lnkFullName" href="/path/to/index.aspx?srn=ABC1234" target="_self">Name Surname</a>
</td><td>ABCD</td>
</tr><tr class="reports" style="background-color:#EFF3FB;border-style:None;">
<td style="width:70px;">ABC1234</td><td>
<a id="gvEmployees_ctl07_lnkFullName" href="/path/to/index.aspx?srn=ABC1234" target="_self">Name Surname</a>
</td><td>ABCD</td>
</tr><tr class="reports" style="background-color:White;border-style:None;">
<td style="width:70px;">ABC1234</td><td>
<a id="gvEmployees_ctl08_lnkFullName" href="/path/to/index.aspx?srn=ABC1234" target="_self">Name Surname</a>
</td><td>ABCD</td>
</tr><tr class="reports" style="background-color:#EFF3FB;border-style:None;">
<td style="width:70px;">ABC1234</td><td>
<a id="gvEmployees_ctl09_lnkFullName" href="/path/to/index.aspx?srn=ABC1234" target="_self">Name Surname</a>
</td><td>ABCD</td>
</tr><tr class="reports" style="background-color:White;border-style:None;">
<td style="width:70px;">ABC1234</td><td>
<a id="gvEmployees_ctl10_lnkFullName" href="/path/to/index.aspx?srn=ABC1234" target="_self">Name Surname</a>
</td><td>ABCD</td>
</tr><tr class="reports" style="background-color:#EFF3FB;border-style:None;">
<td style="width:70px;">ABC1234</td><td>
<a id="gvEmployees_ctl11_lnkFullName" href="/path/to/index.aspx?srn=ABC1234" target="_self">Name Surname</a>
</td><td>ABCD</td>
</tr><tr class="PagerStyle" style="color:#000039;border-style:None;">
<td colspan="3"><table border="0">
<tr>
<td><span>1</span></td><td><a href="javascript:__doPostBack('gvEmployees','Page$2')" style="color:#000039;">2</a></td>
</tr>
</table></td>
</tr>
</table>
</div>
</div>
</div>
<div id="searchBox">
<strong>Search People Finder:</strong>
<br /><br />
<span>Forename:</span><br/>
<span><input name="txtFirstname" type="text" id="txtFirstname" /></span><br/>
<span>Surname:</span><br/>
<span><input name="txtSurname" type="text" id="txtSurname" /></span><br/>
<span>RACFID:</span><br/>
<span><input name="txtRacfid" type="text" id="txtRacfid" /></span><br/>
<span>Staff number:</span><br/>
<span><input name="txtSrn" type="text" id="txtSrn" /></span><br/>
<div class="searchBoxItem" style="text-align:center;width:100%"><input type="submit" name="btnFind" value="Search" onclick="javascript:WebForm_DoPostBackWithOptions(new WebForm_PostBackOptions("btnFind", "", false, "", "index.aspx", false, false))" id="btnFind" title="Search for employees member" class="button" style="border-style:Outset;" /></div><br/>
<div>People Finder searches only UK staff.</div>
<!-- <div><a class="execBoardLink" href="/path/to/index.aspx?srn=ABC1234">Show Executive Board</a></div> -->
<div style="margin-top:5px;"><a href="/path/to/phonebook" target="phoneBook" onclick='return OpenPhonebook();' title="Open Phonebook in new window">Open Phonebook</a></div>
</div>
</div>
<div class="contentFooter" style="text-align:center;">
<table width="100%" cellpadding="0" cellspacing="0" border="0" summary="Navigation layout table">
<tr>
<td align="left"><span class="linkArrow"><</span> <a href="javascript:history.back();">Back</a></td>
<td align="center"></td>
<td align="right"><span class="linkArrow">^ </span><a href="#top">Top</a></td>
</tr>
</table>
</div>
<div>
<input type="hidden" name="__PREVIOUSPAGE" id="__PREVIOUSPAGE" value="vy066Txz34y1E515UsTSTDabHKEmdBRCsq7xM0lpJls1" />
<input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION" value="/wEWCgKM3uTTAgLP/83pDwLfwaTTAQKNguzjCAKt98LeCwLZh62pDwKKqdGpBwLd2q7jAwKa+5aMBAL5zb65C42zY4GBEUKujhjtZ/hZ8sLESfiF" />
</div></form>
</body>
</html>