I am trying to load and parse html in adobe air. The main purpose being to extract title, meta tags and links. I have been trying the HTMLLoader but I get all sort of errors, mainly javascript uncaught exceptions.
I also tried to load the html content directly (using URLLoader) and push the text into HTMLLoader (using loadString(...)) but got the same error. Last resort was to try and load the text into xml and then use E4X queries or xpath, no luck there cause the html is not well formed.
My questions are:
- Is there simple and reliable (air/action script) DOM component there (I do not need to display the page and headless mode will do)?
- Is there any library to convert (crappy) html into well formed xml so I can use xpath/E4X
- Any other suggestions on how to do this?
thx