tags:

views:

158

answers:

5

friend's I have to parse the description from url,where parsed content have few html tags,so how can i convert it to plain text.

Thanks in advance.

+2  A: 

Use a HTML parser like htmlCleaner

For detailed answer : http://stackoverflow.com/questions/1699313/how-to-remove-html-tag-in-java

Ankit Jain
+1  A: 

I'd recommend parsing the raw HTML through jTidy which should give you output which you can write xpath expressions against. This is the most robust way I've found of scraping HTML.

Jon Freedman
A: 

Just getting rid of HTML tags is simple:

// replace all occurrences of one or more HTML tags with optional
// whitespace inbetween with a single space character 
String strippedText = htmlText.replaceAll("(?s)<[^>]*>(\\s*<[^>]*>)*", " ");

But unfortunately the requirements are never that simple:

Usually, <p> and <div> elements need a separate handling, there may be cdata blocks with > characters (e.g. javascript) that mess up the regex etc.

seanizer
good that you clarified all that complexity!
Ankit Jain
A: 

Getting the Text in an HTLM Document

camickr
A: 

you can use this single line to remove the html tags and display as plain text...

htmlString=htmlString.replaceAll("\\<.*?\\>", ""));
Kandhasamy