views:

51

answers:

2

I was wondering what the best approach is on Android to retrieve information from a HTML page hosted on the internet?

For example I'd like to be able to get the text from the following page at the start of each day:

http://www.met.ie/forecasts/sea-area.asp

I have been downloading and parsing XML files but I have never tried to parse information from a HTML type file before.

Is there a native way to parse the information I want?

Or do I need a third party library?

Or do I need to look into screen scraping?

+1  A: 

Is there a native way to parse the information I want?

No.

Or do I need a third party library?

Yes.

Or do I need to look into screen scraping?

What you are looking to do fits the term "screen scraping" as it is used with respect to Web sites. As I wrote in a previous question on this topic, to parse HTML, you use an HTML parser. There are several open source ones, and it is reasonably likely that one or more will work on Android with few modifications if any.

CommonsWare
+2  A: 

If you are parsing HTML, regardless of how you do it, you are screen scraping. Techniques run the gambit from regular expressions to 3rd party libraries like jTidy. Only problem is does jTidy work on Android? I don't know. You'll have to research it.

I'd suggest using regular expressions, compile them, and cache the Pattern object for performance.

If you can't get a proper webservice API for the data you want then you always run the risk of the author changing the layout and moving the data on you and breaking your code. That's why screen scraping is generally frowned upon and only used as a last ditch effort.

chubbard