Given the string "<table><tr><td>Hello World!</td></tr></table>"
, what is the (easiest) way to get a DOM Element representing it?
views:
1061answers:
6
+2
A:
How do you make use of the HTML-processing capabilities that are built into Java? You may not know that Swing contains all the classes necessary to parse HTML. Jeff Heaton shows you how.
Andrew Hare
2009-09-30 13:02:50
+3
A:
you could use HTML Parser, which a Java library used to parse HTML in either a linear or nested fashion. It is an open source tool and can be found on SourceForge
nkr1pt
2009-09-30 13:03:13
+1
A:
I've used Jericho HTML Parser it's OSS, detects(forgives) badly formatted tags and is lightweight
non sequitor
2009-09-30 13:10:07
+1
A:
Here's a way:
import java.io.*;
import javax.swing.text.*;
import javax.swing.text.html.*;
import javax.swing.text.html.parser.*;
public class HtmlParseDemo {
public static void main(String [] args) throws Exception {
Reader reader = new StringReader("<table><tr><td>Hello</td><td>World!</td></tr></table>");
HTMLEditorKit.Parser parser = new ParserDelegator();
parser.parse(reader, new HTMLTableParser(), true);
reader.close();
}
}
class HTMLTableParser extends HTMLEditorKit.ParserCallback {
private boolean encounteredATableRow = false;
public void handleText(char[] data, int pos) {
if(encounteredATableRow) System.out.println(new String(data));
}
public void handleStartTag(HTML.Tag t, MutableAttributeSet a, int pos) {
if(t == HTML.Tag.TR) encounteredATableRow = true;
}
public void handleEndTag(HTML.Tag t, int pos) {
if(t == HTML.Tag.TR) encounteredATableRow = false;
}
}
Bart Kiers
2009-09-30 13:10:58
A:
This code works:
public static void main(String args[]) {
String string = "<table><tr><td>Hello World!</td></tr></table>";
int i = 0;
while (countChars(string, '<') != 0)
{
string = string.replaceAll("</?.{" + i + "}>", "");
i++;
}
System.out.println(string);
}
public static int countChars(String s, char c)
{
int count = 0;
for (int i = 0; i < s.length(); i++)
{
if (s.charAt(i) == c)
{
count++;
}
}
return count;
}
I'm not sure the code will work for every html-string you try.
Martijn Courteaux
2009-09-30 13:12:25
Why -1?? It works realy!!
Martijn Courteaux
2009-09-30 13:40:27
It wasn't me who down voted. But I cannot see how your answer even comes close to answering the question. The OP wants to parse to a DOM element ...
Stephen C
2009-09-30 14:58:01
I'm only 14 ages old. Speaking Dutch and living in Belgium.
Martijn Courteaux
2009-10-01 15:23:46
"I'm only 14 ages old. Speaking Dutch and living in Belgium." I don't see how that matters; are you looking for pity or something?
geowa4
2009-10-21 12:23:51
I mean that I don't understand everything correctly
Martijn Courteaux
2009-10-21 18:18:03
+1
A:
I found this somewhere (don't remember where):
public static DocumentFragment parseXml(Document doc, String fragment)
{
// Wrap the fragment in an arbitrary element.
fragment = "<fragment>"+fragment+"</fragment>";
try
{
// Create a DOM builder and parse the fragment.
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
Document d = factory.newDocumentBuilder().parse(
new InputSource(new StringReader(fragment)));
// Import the nodes of the new document into doc so that they
// will be compatible with doc.
Node node = doc.importNode(d.getDocumentElement(), true);
// Create the document fragment node to hold the new nodes.
DocumentFragment docfrag = doc.createDocumentFragment();
// Move the nodes into the fragment.
while (node.hasChildNodes())
{
docfrag.appendChild(node.removeChild(node.getFirstChild()));
}
// Return the fragment.
return docfrag;
}
catch (SAXException e)
{
// A parsing error occurred; the XML input is not valid.
}
catch (ParserConfigurationException e)
{
}
catch (IOException e)
{
}
return null;
}
IttayD
2009-10-02 12:28:47