I want to extract all the links from a page. I am using HTML:LinkExtor
. How do I extract all the links that point to HTML content pages only?
I also cannot extract these kinds of links:
javascript:openpopup('http://www.admissions.college.harvard.edu/financial_aid/index.html'),
EDIT: HTML Pages - text/html. I am not indexing pictures etc.