ansaurus

Question

what is the best html parser for java?

Answer 1

A:

The best would be the one that gets the job done right.

There is a opensource one called tagsoup, and also jTidy

VoodooChild 2010-06-25 20:19:26

Answer 2

+3 A:

I would recommend Jsoup for this. It has a very nice API with support for jQuery like CSS selectors and non-verbose element iteration. To take a copy of this answer as an example, this prints your own question and the name of all answerers here:

URL url = new URL("http://stackoverflow.com/questions/3121136");
Document document = Jsoup.parse(url, 3000);

String question = document.select("#question .post-text").text();
System.out.println("Question: " + question);

Elements answerers = document.select("#answers .user-details a");
for (Element answerer : answerers) {
    System.out.println("Answerer: " + answerer.text());
}

An alternative would be XPath, but JSoup is more useful for webdevelopers who already have a good grasp on CSS selectors.

BalusC 2010-06-25 20:38:37

Thanks! This looks great.

egervari 2010-06-26 23:54:01

You're welcome.

BalusC 2010-06-26 23:57:25

ansaurus

tags:

views:

answers:

what is the best html parser for java?

related questions