ansaurus

Question

Java (Android) regular expression to strip out HTML paragraph

Answer 1

A:

If it's simple, just do a regex.

If you are getting XML from an external source that you own, I would parse it there.

Macarse 2010-04-18 21:40:25

Answer 2

+1 A:

http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454

So, I'll ask the question that wraps up the linked-to answer: have you tried using an XML parser instead?

You might get some ideas from some of the other answers there, too, but I'd try to avoid the regex path. As Macarse suggested, clean this up on the server if you can. If not, wrap those three <p> elements in a single root element and parse it using SAX or something, paying attention to the 2nd paragraph element.

CommonsWare 2010-04-18 21:52:23

Answer 3

A:

just doing a split: http://developer.android.com/reference/java/lang/String.html#split(java.lang.String)

on "</p><p>" and taking the second entry in the returned array would actually do it pretty quickly

jqpubliq 2010-04-18 21:56:55

Answer 4

A:

If you are going to parse an XML file downloaded from website, then there is nothing to do with Android.

Kavin 2010-04-19 00:29:21

Answer 5

A:

The regex would probably look something like: .*?>(.*?)<.* And you access the grouped content by calling group(1) on the Matcher object.

picknick 2010-04-19 11:08:41

ansaurus

tags:

views:

answers:

Java (Android) regular expression to strip out HTML paragraph

related questions