I want to parse some HTML in order to find the values of some attributes/tags etc.
What HTML parsers do you recommend? Any pros and cons?
I want to parse some HTML in order to find the values of some attributes/tags etc.
What HTML parsers do you recommend? Any pros and cons?
Do you need to do a full parse of the HTML? If you're just looking for specific values within the contents (a specific tag/param), then a simple regular expression might be enough, and could very well be faster.
@Herms yes, I agree that regex is good for simple text finding. Can you provide an example that will find the word "here" in the following text?
<tag>here</tag>
<tag attr="here">test</tag>
<here>test</here>
Also, just so this question includes everything, I would like to hear some library recommendations as well.
NekoHTML, TagSoup, and JTidy will allow you to parse HTML and then process with XML tools, like XPath.
I am newbie to HTML parsing.. I knew Java and HTML pretty well.. I come to know that HTMLParser is an easy tool to work with.. But there is limited resources available to learn and use it.. Can anyone suggest me where to start ?..