Xpath and HTML Cleaner problem, no data returned. | ansaurus

tags:

views:

146

answers:

1

Q:

Xpath and HTML Cleaner problem, no data returned.

Hi, new to the community. been up all night trying to flesh out the underlying html reading system that's at the core of my app's functionally. I could really use a fresh pair of eyes on this one.

Problem: While trying to return a string to be displayed on my app's home activity, I've run into an issue where I'm almost certain that the data was taken in correctly, cleaned up into XML via "Html Cleaner" (http://htmlcleaner.sourceforge.net/), and pulled through Jaxen (opensource Xpath) the result should display some text. Problem is of course, dispite my efforts I've yet to figure out exactly why it wont. My code follows below.

As a test I'm trying to pull the word "maps" from the http://www.google.com home page which is inside an tag with the hyperlink "http://maps.google.com/maps?hl=en&tab=wl" (which i'm using to uniquely identify the tag):

public class home extends Activity {

  TextView text1;


  //** Called when the activity is first created. */
  @Override
  public void onCreate(Bundle savedInstanceState)
  {
   super.onCreate(savedInstanceState);
   setContentView(R.layout.main);

   text1 = (TextView)findViewById(R.id.text1);
   text1.setText(LoadHTMLFromURL("http://www.google.com"));
  }



  private String LoadHTMLFromURL(String url)
  {
   try
   {
    // Load data from URL     
     InputStream is = (InputStream) new URL(url).getContent(); //generate
     BufferedReader reader = new BufferedReader(new InputStreamReader(is));
     StringBuilder stringBuilder = new StringBuilder();
     String line = null;

     while ((line = reader.readLine()) != null) 
     {
      stringBuilder.append(line + "");
     }
     is.close();

     String HTMLout = stringBuilder.toString();

     // Clean up HTML input.
     //Initialize HTML Cleaner.
     HtmlCleaner cleaner = new HtmlCleaner();

     // This next line Cleans the html and exports it to a Tagnode named "node"
     TagNode node = cleaner.clean(HTMLout);

     // This is the xpath parsing info
     String SearchTerm = "//a[@href='http://maps.google.com/maps?hl=en&amp;amp;tab=wl']";


     Object[] info_nodes = node.evaluateXPath(SearchTerm);

     TagNode info_node = (TagNode) info_nodes[0];
              String info = info_node.getChildren().iterator().next().toString().trim();

              return info;
   }

   catch (Exception e) 
   {
    System.out.println( "Inside: home.LoadHTMLFromURL()" + "Exc="+e);
    return null;
   }

  }
 }

I apologize for the clutter, and lack of neatness in the code, still a mid to low range programer in a "learn as you go" stage of my ability. Any advice is appreciated.

side note: I ran a string containing some hand made simple XML to test if it would read the info, and it worked perfectly but not on xml generated from html webpages.

A:

Ok, I believe the issue was my search term. my xpath term was typed wrong.

2010-08-16 06:04:44

related questions

Load an XmlNodeList into an XmlDocument without looping?

Does System.Xml use MSXML?

Using an XML catalog with Python's lxml?

Why Are People Still Creating RSS Feeds?

Pretty printing XML files on Emacs

Application configuration files

What is the best XML editor?

How much extra overhead is generated when sending a file over a web service as a byte array?

XPATHS and Default Namespaces

How to parse XML in VBA

Small modification to an XML document using StAX

how to use xpath in python

Best binary XML format for JavaME

How can I split an XML document into thirds (or, even better, n pieces)?

Test serialization encoding

Is it "bad practice" to be sensitive to linebreaks in XML documents?

HTML comments break down

Authoritative source on XML-sig

Best way to get InnerXml of an XElement?

HTML version choice

SQL 2005 For XML Explicit - Need help formatting

Any experiences with Protocol Buffers?

XML Editing/Viewing Software

XML Processing in Python

Converting CSV File to XML in Java