tags:

views:

1061

answers:

6

Given the string "<table><tr><td>Hello World!</td></tr></table>", what is the (easiest) way to get a DOM Element representing it?

+2  A: 

You could use Swing:

How do you make use of the HTML-processing capabilities that are built into Java? You may not know that Swing contains all the classes necessary to parse HTML. Jeff Heaton shows you how.

Andrew Hare
+3  A: 

you could use HTML Parser, which a Java library used to parse HTML in either a linear or nested fashion. It is an open source tool and can be found on SourceForge

nkr1pt
+1  A: 

I've used Jericho HTML Parser it's OSS, detects(forgives) badly formatted tags and is lightweight

non sequitor
+1  A: 

Here's a way:

import java.io.*;
import javax.swing.text.*;
import javax.swing.text.html.*;
import javax.swing.text.html.parser.*;

public class HtmlParseDemo {
   public static void main(String [] args) throws Exception {
       Reader reader = new StringReader("<table><tr><td>Hello</td><td>World!</td></tr></table>");
       HTMLEditorKit.Parser parser = new ParserDelegator();
       parser.parse(reader, new HTMLTableParser(), true);
       reader.close();
   }
}

class HTMLTableParser extends HTMLEditorKit.ParserCallback {

    private boolean encounteredATableRow = false;

    public void handleText(char[] data, int pos) {
        if(encounteredATableRow) System.out.println(new String(data));
    }

    public void handleStartTag(HTML.Tag t, MutableAttributeSet a, int pos) {
        if(t == HTML.Tag.TR) encounteredATableRow = true;
    }

    public void handleEndTag(HTML.Tag t, int pos) {
        if(t == HTML.Tag.TR) encounteredATableRow = false;
    }
}
Bart Kiers
A: 

This code works:

public static void main(String args[]) {
    String string = "<table><tr><td>Hello World!</td></tr></table>";
    int i = 0;
    while (countChars(string, '<') != 0)
    {
        string = string.replaceAll("</?.{" + i + "}>", "");
        i++;
    }
    System.out.println(string);
}

public static int countChars(String s, char c)
{
    int count = 0;
    for (int i = 0; i < s.length(); i++)
    {
        if (s.charAt(i) == c)
        {
            count++;
        }
    }
    return count;
}

I'm not sure the code will work for every html-string you try.

Martijn Courteaux
Why -1?? It works realy!!
Martijn Courteaux
It wasn't me who down voted. But I cannot see how your answer even comes close to answering the question. The OP wants to parse to a DOM element ...
Stephen C
I'm only 14 ages old. Speaking Dutch and living in Belgium.
Martijn Courteaux
"I'm only 14 ages old. Speaking Dutch and living in Belgium." I don't see how that matters; are you looking for pity or something?
geowa4
I mean that I don't understand everything correctly
Martijn Courteaux
+1  A: 

I found this somewhere (don't remember where):

 public static DocumentFragment parseXml(Document doc, String fragment)
 {
    // Wrap the fragment in an arbitrary element.
    fragment = "<fragment>"+fragment+"</fragment>";
    try
    {
        // Create a DOM builder and parse the fragment.
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        Document d = factory.newDocumentBuilder().parse(
                new InputSource(new StringReader(fragment)));

        // Import the nodes of the new document into doc so that they
        // will be compatible with doc.
        Node node = doc.importNode(d.getDocumentElement(), true);

        // Create the document fragment node to hold the new nodes.
        DocumentFragment docfrag = doc.createDocumentFragment();

        // Move the nodes into the fragment.
        while (node.hasChildNodes())
        {
            docfrag.appendChild(node.removeChild(node.getFirstChild()));
        }
        // Return the fragment.
        return docfrag;
    }
    catch (SAXException e)
    {
        // A parsing error occurred; the XML input is not valid.
    }
    catch (ParserConfigurationException e)
    {
    }
    catch (IOException e)
    {
    }
    return null;
}
IttayD