Hi I am writing an application in java I need to fetch specific data from website.I do not know which one to use whether REGEX or Parser.Can anybody please advise me how to get this done? and which one is prefered.
Thanks
Hi I am writing an application in java I need to fetch specific data from website.I do not know which one to use whether REGEX or Parser.Can anybody please advise me how to get this done? and which one is prefered.
Thanks
Definitely, Get a HTML Parser
Here is some comparison about few Java HTML Parsers.
Some of them here
NekoHTML:
final DOMParser parser = new DOMParser();
try {
parser.parse(new InputSource(urlIS));
document = parser.getDocument();
} catch (SAXException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
TagSoup:
final Parser parser = new Parser();
SAX2DOM sax2dom = null;
try {
sax2dom = new SAX2DOM();
parser.setContentHandler(sax2dom);
parser.setFeature(Parser.namespacesFeature, false);
parser.parse(new InputSource(urlIS));
} catch (Exception e) {
e.printStackTrace();
}
document = sax2dom.getDOM();
I believe the choice quite is "Even Jon Skeet cannot parse HTML using regular expressions.". Depending on how complex the information you're trying to pull out of html is, you may be better off with some sort of a parser. What are you looking to pull and from where?