tags:

views:

25

answers:

2

I'm doing an application on Android.

I have the content of a web (all the HTML) in a String, and i need extract all the text inside the paragraphs (p elements) with the class="content".

Example:

<p class="content">La la la</p>
<p class="another">Le le le</p>
<p class="content">Li li li</p>

Result:

La la la
Li li li

What is the best approach to do this?

+1  A: 
import java.io.DataInputStream;
import java.io.IOException;
import java.net.MalformedURLException;
import java.net.URL;
import java.net.URLConnection;


public class Test {
    void readScreen () //reads from server
      {
        try
        {
          URL                url;
          URLConnection      urlConn;
          DataInputStream    dis;

          //Open url
          url = new URL("http://somewebsite.com");

          // Note:  a more portable URL:
          //url = new URL(getCodeBase().toString() + "/ToDoList/ToDoList.txt");

          urlConn = url.openConnection();
          urlConn.setDoInput(true);
          urlConn.setUseCaches(false);

          dis = new DataInputStream(urlConn.getInputStream());
          String s;

          while ((s = dis.readLine()) != null)
          {
            System.out.println(s); //this is where it reads from the screen
          }
            dis.close();
          }

          catch (MalformedURLException mue) {}
          catch (IOException ioe) {}
        }

    public static void main(String[] args){

        Test thisTest = new Test();
        thisTest.readScreen();

    }
}
Mike
First of all, thank you for your help :)I've done that, my problem is that i don't know how to extract only certain parts of the web (in my case, all the paragraphs with class="content").I know that i can do a manual search in all the lines but there must be a better way to accomplish it
It would probably be better for you to download the html file and then parse through the text in there. You could possibly use some xml utilities to find the tags you wanted. This is about as much as I have done with the web and Java sorry I can't be more help.
Mike
+1  A: 

A regular expression would be your best bet.

http://download-llnw.oracle.com/javase/1.4.2/docs/api/java/util/regex/Pattern.html

Paddy Foran