My task is to find the actual Press release links of a given link. Say http://www.apple.com/pr/ for example.
My tool has to find the press release links alone from the above URL excluding other advertisement links, tab links(or whatever) that are found in that site.
The program below is developed and the result this gives is, all the links that are present in the given webpage.
How can I modify the below program to find the Press Release links alone from a given URL? Also, I want the program to be generic so that it identifies press release links from any press release URLs if given.
import java.io.*;
import java.net.URL;
import java.net.URLConnection;
import java.sql.*;
import org.jsoup.nodes.Document;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Element;
public class linksfind{
public static void main(String[] args) {
try{
URL url = new URL("http://www.apple.com/pr/");
Document document = Jsoup.parse(url, 1000); // Can also take an URL.
for (Element element : document.getElementsByTag("a")) {
System.out.println(element.attr("href"));}
}catch (Exception ex){ex.printStackTrace();}
}
}