tags:

views:

374

answers:

1

I am using htmlparser (htmlparser.org) to re-write all the link's in a input String.

All i need to do is iterate over all the link tags (<a href=...), that appear in the input String, grab their value, perform some regex to determine how they should be manipulated, and then update the link's href, target and onclick values accordingly.

I am not sure how exactly I can update only the select link elements in the input String, will leaving all other data in the input String untouched.

It seems like the htmlparser library can extract certain elements for manipulation but it can't manipulate elements in their original context, and the then return their updated values will maintaining the integrity of the original context.

Any help would be greatly appreciated.

Thanks

A: 

This is a very simple example, but shows you how to setup the node iterator.

    public static String setExternalLinkTargets(String html) {
    final NodeVisitor linkVisitor = new NodeVisitor() {

        @Override
        public void visitTag(Tag tag) {
            // Process any tag/node in your HTML 
            String name = tag.getTagName();

            // Set the Link's target to _blank if the href is external
            if ("a".equalsIgnoreCase(name)) {
                if(isExternalLink(url.getHost()) {
                    tag.setAttribute("target", "_blank");
                }
            }
        }
    };

    Parser parser = Parser.createParser(html, null);
    NodeList list;
    try {
        list = parser.parse(null);
        list.visitAllNodesWith(linkVisitor);
        return list.toHtml();
    } catch (ParserException e) {
        // Could not parse HTML, return original HTML
        return html;
    }
}
empire29