ansaurus

Question

Parse HTML in Android

Answer 1

A:

You can try with XMLPullParser available in Android. You can use StringBuffer to append characters in between tags.

Rahul 2010-09-02 08:52:37

Answer 2

A:

Try using a regex to gain the information you want: http://java.sun.com/developer/technicalArticles/releases/1.4regex/

You could even use it to remove the hidden characters. Or maybe use String.Replace to remove the newline characters?

BeRecursive 2010-09-02 10:42:55

I tried String.ReplaceAll("\n") but still gave me issues.

Alejandro Huerta 2010-09-02 20:32:19

I figured out where I went wrong with the replaceAll("\n", "") and it worked well, thank you.

Alejandro Huerta 2010-09-03 03:28:31

Answer 3

+2 A:

If you want to make parsing very easy, try Jsoup:

This example will download the page, parse and get the text.

Document doc = Jsoup.connect("http://jsoup.org").get();

Elements tds = doc.select("td.bodybox");

for (Element td : tds) {
  String tdText = td.text();
}

droidgren 2010-09-02 13:47:47

Jsoup is working out pretty well, thank you. Only issue I can see at the moment is it gives me extra characters at the beginning and end, for instance: "Â 21,670,510,504 $Â" I believe is it because of the " " within the HTML. Is there anyway to have Jsoup delete that?

Alejandro Huerta 2010-09-02 20:30:39

Alejandro Huerta 2010-09-03 03:27:41

You can also replace afterwards: `tdText.replace(Jsoup.parse(" ").text(), " ");`

Michael Mrozek 2010-09-25 06:33:27

Answer 4

A:

You can parse the HTML file using a XMLReader for example as far as i know, check this article http://www.ibm.com/developerworks/xml/library/x-andbene1/

Kharizmi 2010-09-03 03:39:11

ansaurus

tags:

views:

answers:

Parse HTML in Android

related questions