views:

237

answers:

1

I am trying to load and parse html in adobe air. The main purpose being to extract title, meta tags and links. I have been trying the HTMLLoader but I get all sort of errors, mainly javascript uncaught exceptions.

I also tried to load the html content directly (using URLLoader) and push the text into HTMLLoader (using loadString(...)) but got the same error. Last resort was to try and load the text into xml and then use E4X queries or xpath, no luck there cause the html is not well formed.

My questions are:

  1. Is there simple and reliable (air/action script) DOM component there (I do not need to display the page and headless mode will do)?
  2. Is there any library to convert (crappy) html into well formed xml so I can use xpath/E4X
  3. Any other suggestions on how to do this?

thx

+1  A: 

Afaik:

  1. No :-(
  2. No :-(
  3. I think the easiest way to grab title and meta tags is writing some regular expressions. You can load the page's HTML code into a string and then read out whatever you need like this:

var str:String = ""; // put HTML code in here

var pattern:RegExp = /<title>(.+)<\/title>/i;

trace(pattern.exec(str));
Thomas