Hi, new to the community. been up all night trying to flesh out the underlying html reading system that's at the core of my app's functionally. I could really use a fresh pair of eyes on this one.
Problem: While trying to return a string to be displayed on my app's home activity, I've run into an issue where I'm almost certain that the data was taken in correctly, cleaned up into XML via "Html Cleaner" (http://htmlcleaner.sourceforge.net/), and pulled through Jaxen (opensource Xpath) the result should display some text. Problem is of course, dispite my efforts I've yet to figure out exactly why it wont. My code follows below.
As a test I'm trying to pull the word "maps" from the http://www.google.com home page which is inside an tag with the hyperlink "http://maps.google.com/maps?hl=en&tab=wl" (which i'm using to uniquely identify the tag):
public class home extends Activity {
TextView text1;
//** Called when the activity is first created. */
@Override
public void onCreate(Bundle savedInstanceState)
{
super.onCreate(savedInstanceState);
setContentView(R.layout.main);
text1 = (TextView)findViewById(R.id.text1);
text1.setText(LoadHTMLFromURL("http://www.google.com"));
}
private String LoadHTMLFromURL(String url)
{
try
{
// Load data from URL
InputStream is = (InputStream) new URL(url).getContent(); //generate
BufferedReader reader = new BufferedReader(new InputStreamReader(is));
StringBuilder stringBuilder = new StringBuilder();
String line = null;
while ((line = reader.readLine()) != null)
{
stringBuilder.append(line + "");
}
is.close();
String HTMLout = stringBuilder.toString();
// Clean up HTML input.
//Initialize HTML Cleaner.
HtmlCleaner cleaner = new HtmlCleaner();
// This next line Cleans the html and exports it to a Tagnode named "node"
TagNode node = cleaner.clean(HTMLout);
// This is the xpath parsing info
String SearchTerm = "//a[@href='http://maps.google.com/maps?hl=en&tab=wl']";
Object[] info_nodes = node.evaluateXPath(SearchTerm);
TagNode info_node = (TagNode) info_nodes[0];
String info = info_node.getChildren().iterator().next().toString().trim();
return info;
}
catch (Exception e)
{
System.out.println( "Inside: home.LoadHTMLFromURL()" + "Exc="+e);
return null;
}
}
}
I apologize for the clutter, and lack of neatness in the code, still a mid to low range programer in a "learn as you go" stage of my ability. Any advice is appreciated.
side note: I ran a string containing some hand made simple XML to test if it would read the info, and it worked perfectly but not on xml generated from html webpages.