I extract some code from a web page (http://www.opensolaris.org/os/community/on/flag-days/all/) like follows,
<tr class="build">
<th colspan="0">Build 110</th>
</tr>
<tr class="arccase project flagday">
<td>Feb-25</td>
<td></td>
<td></td>
<td></td>
<td>
<a href="../pages/2009022501/">Flag Day and Heads Up: Power Aware Dispatcher and Deep C-States</a><br />
cpupm keyword mode extensions - <a href="/os/community/arc/caselog/2008/777/">PSARC/2008/777</a><br />
CPU Deep Idle Keyword - <a href="/os/community/arc/caselog/2008/663/">PSARC/2008/663</a><br />
</td>
</tr>
and there are some relative url path in it, now I want to search it with regular expression and replace them with absolute url path. Since I know urljoin can do the replace work like that,
>>> urljoin("http://www.opensolaris.org/os/community/on/flag-days/all/",
... "/os/community/arc/caselog/2008/777/")
'http://www.opensolaris.org/os/community/arc/caselog/2008/777/'
Now I want to know that how to search them using regular expressions, and finally tanslate the code to,
<tr class="build">
<th colspan="0">Build 110</th>
</tr>
<tr class="arccase project flagday">
<td>Feb-25</td>
<td></td>
<td></td>
<td></td>
<td>
<a href="http://www.opensolaris.org/os/community/on/flag-days/all//pages/2009022501/">Flag Day and Heads Up: Power Aware Dispatcher and Deep C-States</a><br />
cpupm keyword mode extensions - <a href="http://www.opensolaris.org/os/community/arc/caselog/2008/777/">PSARC/2008/777</a><br />
CPU Deep Idle Keyword - <a href="http://www.opensolaris.org/os/community/arc/caselog/2008/663/">PSARC/2008/663</a><br />
</td>
</tr>
My knowledge of regular expressions is so poor that I want to know how to do that. Thanks
I have finished the work using Beautiful Soup, haha~ Thx for everybody!