friend's I have to parse the description from url,where parsed content have few html tags,so how can i convert it to plain text.
Thanks in advance.
friend's I have to parse the description from url,where parsed content have few html tags,so how can i convert it to plain text.
Thanks in advance.
Use a HTML parser like htmlCleaner
For detailed answer : http://stackoverflow.com/questions/1699313/how-to-remove-html-tag-in-java
I'd recommend parsing the raw HTML through jTidy which should give you output which you can write xpath expressions against. This is the most robust way I've found of scraping HTML.
Just getting rid of HTML tags is simple:
// replace all occurrences of one or more HTML tags with optional
// whitespace inbetween with a single space character
String strippedText = htmlText.replaceAll("(?s)<[^>]*>(\\s*<[^>]*>)*", " ");
But unfortunately the requirements are never that simple:
Usually, <p>
and <div>
elements need a separate handling, there may be cdata blocks with >
characters (e.g. javascript) that mess up the regex etc.
you can use this single line to remove the html tags and display as plain text...
htmlString=htmlString.replaceAll("\\<.*?\\>", ""));