ansaurus

Question

What HTML parsing libraries do you recommend in Java

Answer 1

+6 A:

I have tried HTML Parser which is dead simple.

pek 2008-08-25 18:55:11

I have used HTML parser on a project and it worked exactly as expected

Craig Angus 2008-09-27 00:21:14

but there is not much tutorials available...

Lily 2009-07-07 14:25:04

Answer 2

+1 A:

Do you need to do a full parse of the HTML? If you're just looking for specific values within the contents (a specific tag/param), then a simple regular expression might be enough, and could very well be faster.

Herms 2008-08-25 18:56:36

Answer 3

A:

@Herms yes, I agree that regex is good for simple text finding. Can you provide an example that will find the word "here" in the following text?

<tag>here</tag>
<tag attr="here">test</tag>
<here>test</here>

Also, just so this question includes everything, I would like to hear some library recommendations as well.

pek 2008-08-25 19:05:59

Answer 4

+10 A:

NekoHTML, TagSoup, and JTidy will allow you to parse HTML and then process with XML tools, like XPath.

jelovirt 2008-08-25 19:22:20

XPath is the way for HTML parsing, it helps in case of bad formed HTML as well where regex fails.

Sumit Ghosh 2010-05-14 10:09:14

Answer 5

A:

I am newbie to HTML parsing.. I knew Java and HTML pretty well.. I come to know that HTMLParser is an easy tool to work with.. But there is limited resources available to learn and use it.. Can anyone suggest me where to start ?..

TAM 2009-08-20 08:27:45

you should start a new question and reference this one.

Markus Lausberg 2009-08-20 08:30:20

ansaurus

tags:

views:

answers:

What HTML parsing libraries do you recommend in Java

related questions