Want to grab list of players from http://www.atpworldtour.com/Rankings/Singles.aspx
There is a table with class "bioTableAlt"
, we have to grab all the <tr>
after the first one (class "bioTableHead"
), which is used for table heading.
Wanted content looks like:
<tr class="oddRow">
<td>2</td>
<td>
<a href="/Tennis/Players/Top-Players/Novak-Djokovic.aspx">Djokovic, Novak</a>
(SRB)
</td>
<td>
<a href="/Tennis/Players/Top-Players/Novak-Djokovic.aspx?t=rb">6,905</a>
</td>
<td>0</td>
<td>
<a href="/Tennis/Players/Top-Players/Novak-Djokovic.aspx?t=pa&m=s">21</a>
</td>
</tr>
<tr>
<td>3</td>
<td>
<a href="/Tennis/Players/Top-Players/Roger-Federer.aspx">Federer, Roger</a>
(SUI)
</td>
<td>
<a href="/Tennis/Players/Top-Players/Roger-Federer.aspx?t=rb">6,795</a>
</td>
<td>0</td>
<td>
<a href="/Tennis/Players/Top-Players/Roger-Federer.aspx?t=pa&m=s">21</a>
</td>
</tr>
I think the best idea is to create an array()
, make each <tr>
an unique row and throw final code to the list.txt
file, like:
Array (
[2] => stdClass Object (
[name] => Djokovic, Novak
[country] => SRB
[rank] => 6,905
)
[3] => stdClass Object (
[name] => Federer, Roger
[country] => SUI
[rank] => 6,795
)
)
We're parsing each <tr>
:
[2]
is a number from first<td>
[name]
is text of the link inside second<td>
[country]
is a value between (...) in second<td>
[rank]
is the text of the link inside third<td>
In final file list.txt
should contain an array()
with ~100 IDS (we are grabbing the page with first 100 players).
Additionally, will be amazing, if we make a small fix for each [name]
before adding it to an array()
- "Federer, Roger" should be converted to "Roger Federer" (just catch the word before comma, throw it to the end of the line).
Thanks.