I'm using HTML Parser to fetch links from a web page. I need to store the URL, link text and the URL to the parent page containing the link. I have managed to get the link URL as well as the parent URL.
I still ned to get the link text.
<a href="url">link text</a>
Unfortunately I'm having a hard time figuring it out, any help would be greatly appreciated.
public static List<LinkContainer> findUrls(String resource) {
String[] tagNames = {"A", "AREA"};
List<LinkContainer> urls = new ArrayList<LinkContainer>();
Tag tag;
String url;
String sourceUrl;
try {
for (String tagName : tagNames) {
Parser parser = new Parser(resource);
NodeList nodes = parser.parse(new TagNameFilter(tagName));
NodeIterator i = nodes.elements();
while (i.hasMoreNodes()) {
tag = (Tag) i.nextNode();
url = tag.getAttribute("href");
sourceUrl = tag.getPage().getUrl();
if (RegexUtil.verifyUrl(url)) {
urls.add(new LinkContainer(url, null, sourceUrl));
}
}
}
} catch (ParserException pe) {
pe.printStackTrace();
}
return urls;
}