tags:

views:

32

answers:

2

Hi I am writing an application in java I need to fetch specific data from website.I do not know which one to use whether REGEX or Parser.Can anybody please advise me how to get this done? and which one is prefered.

Thanks

+1  A: 

Definitely, Get a HTML Parser

Here is some comparison about few Java HTML Parsers.

Some of them here

NekoHTML:

final DOMParser parser = new DOMParser();
try {
    parser.parse(new InputSource(urlIS));
    document = parser.getDocument();
} catch (SAXException e) {
    e.printStackTrace();
} catch (IOException e) {
    e.printStackTrace();
}

TagSoup:

final Parser parser = new Parser();
SAX2DOM sax2dom = null;
try {
    sax2dom = new SAX2DOM();
    parser.setContentHandler(sax2dom);
    parser.setFeature(Parser.namespacesFeature, false);
    parser.parse(new InputSource(urlIS));
} catch (Exception e) {
    e.printStackTrace();
}
document = sax2dom.getDOM();
S.Mark
+1  A: 

I believe the choice quite is "Even Jon Skeet cannot parse HTML using regular expressions.". Depending on how complex the information you're trying to pull out of html is, you may be better off with some sort of a parser. What are you looking to pull and from where?

R0MANARMY