views:

615

answers:

5

I have an Android application which grabs some data from an external XML source. I've stripped out some HTML from one of the XML elements, but it's in the format:

<p class="x">Some text...</p>
<p>Some more text</p>
<p>Some final text</p>

I want to extract the middle paragraph text, how can I do this? Would a regular expression be the best way? I don't really want to start including external HTML parsing libraries.

A: 

If it's simple, just do a regex.

If you are getting XML from an external source that you own, I would parse it there.

Macarse
+1  A: 

http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454

So, I'll ask the question that wraps up the linked-to answer: have you tried using an XML parser instead?

You might get some ideas from some of the other answers there, too, but I'd try to avoid the regex path. As Macarse suggested, clean this up on the server if you can. If not, wrap those three <p> elements in a single root element and parse it using SAX or something, paying attention to the 2nd paragraph element.

CommonsWare
A: 

just doing a split: http://developer.android.com/reference/java/lang/String.html#split(java.lang.String)

on "</p><p>" and taking the second entry in the returned array would actually do it pretty quickly

jqpubliq
A: 

If you are going to parse an XML file downloaded from website, then there is nothing to do with Android.

Kavin
A: 

The regex would probably look something like: .*?>(.*?)<.* And you access the grouped content by calling group(1) on the Matcher object.

picknick