



In my application, I use a JTextPane to display some log information. As I want to hightlight some specific lines in this text (for example the error messages), I set the contentType as "text/html". This way, I can format my text.

Now, I create a JButton that copies the content of this JTextPane into the clipboard. That part is easy, but my problem is that when I call myTextPane.getText(), I get the HTML code, such as :


    <font color="#FFCC66"><b>foobar</b></font><br>

instead of getting only the raw content:


Is there a way to get only the content of my JTextPane in plain text? Or do I need to transform the HTML into raw text by myself?

+1  A: 

You need to do it yourself unfortunately. Imagine if some of the contents was HTML specific, eg images - the text representation is unclear. Include alt text or not for instance.

Nick Fortescue
+1  A: 

(Is RegExp allowed? This isn't parsing, isn't it)

Take the getText() result and use String.replaceAll() to filter all tags. Than a trim() to remove leading and trailing whitespaces. For the whitespaces between your first and you last 'blabla' I don't see a general solution. Maybe you can spilt the rest around CRLF and trim all Strings again.

(I'm no regexp expert - maybe someone can provide the regexp and earn some reputation ;) )


.. I just assumed that you don't use < and > in your text - otherwise it.. say, it's a challenge.

+1  A: 

Based on the accepted answer to: Removing HTML from a Java String

MyHtml2Text parser = new MyHtml2Text();
try {
    parser.parse(new StringReader(myTextPane.getText()));
} catch (IOException ee) {
  //handle exception

Slightly modified version of the Html2Text class found on the answer I linked to

import javax.swing.text.html.*;
import javax.swing.text.html.parser.*;

public class MyHtml2Text extends HTMLEditorKit.ParserCallback {
    StringBuffer s;
    public MyHtml2Text() {}
    public void parse(Reader in) throws IOException {
        s = new StringBuffer();
        ParserDelegator delegator = new ParserDelegator();
        delegator.parse(in, this, Boolean.TRUE);
    public void handleText(char[] text, int pos) {
    public String getText() {
        return s.toString();

If you need a more fine-grained handling consider implementing more of the interface defined by HTMLEditorKit.ParserCallback

+1  A: 

No need to use the ParserCallback. Just use:

textPane.getDocument().getText(0, textPane.getDocument().getLength()) );
That's indeed a really good solution... except that I lost all the line breaks, and then my final String is only in one line. Too bad, because I really liked this solution!
Yes, the Document doesn't store line breaks, they where manually added by the other solution.