Hi I am looking for a simple URL & title extractor from html files in Java. I am trying to parse bookmarks.html (IE,Firefox) etc and add the title & url to a db. I need to do this in java (no 3rd party libraries allowed) so proably I have to use sax/dom/regex.
A:
You can load up the file into a DOM document and then use an XPath expression to find all the instances of an tag. Extracting the HREF attribute and the tag contents should do what you want to do. The XPath would probably be something as simple as '//A'
.
Jherico
2009-09-01 17:57:46