My HTML contains tags of the following form:
<div class="author"><a href="/user/1" title="View user profile.">Apple</a> - October 22, 2009 - 01:07</div>
I'd like to extract the date, "October 22, 2009 - 01:07" in this example, from each tag
I've implemented javax.swing.text.html.HTMLEditorKit.ParserCallback as follows:
class HTMLParseListerInner extends HTMLEditorKit.ParserCallback {
private ArrayList<String> foundDates = new ArrayList<String>();
private boolean isDivLink = false;
public void handleText(char[] data, int pos) {
if(isDivLink)
foundDates.add(new String(data)); // Extracts "Apple" instead of the date.
}
public void handleStartTag(HTML.Tag t, MutableAttributeSet a, int pos) {
String divValue = (String)a.getAttribute(HTML.Attribute.CLASS);
if (t.toString() == "div" && divValue != null && divValue.equals("author"))
isDivLink = true;
}
}
However, the above parser returns "Apple" which is inside a hyperlink within the tag. How can I fix the parser to extract the date?