parse html in adobe air | ansaurus

tags:

views:

237

answers:

1

+1 Q:

parse html in adobe air

I am trying to load and parse html in adobe air. The main purpose being to extract title, meta tags and links. I have been trying the HTMLLoader but I get all sort of errors, mainly javascript uncaught exceptions.

I also tried to load the html content directly (using URLLoader) and push the text into HTMLLoader (using loadString(...)) but got the same error. Last resort was to try and load the text into xml and then use E4X queries or xpath, no luck there cause the html is not well formed.

My questions are:

Is there simple and reliable (air/action script) DOM component there (I do not need to display the page and headless mode will do)?
Is there any library to convert (crappy) html into well formed xml so I can use xpath/E4X
Any other suggestions on how to do this?

thx

+1 A:

Afaik:

No :-(
No :-(
I think the easiest way to grab title and meta tags is writing some regular expressions. You can load the page's HTML code into a string and then read out whatever you need like this:

var str:String = ""; // put HTML code in here

var pattern:RegExp = /<title>(.+)<\/title>/i;

trace(pattern.exec(str));

Thomas 2010-01-19 14:44:12

related questions

Autosizing Textarea

Regular expression for parsing links from a webpage?

What are good tools for creating compiled HTML help files (.chm)?

Looking for WYSIWYG HTML editor

Any reason not to start using the HTML 5 doctype?

HTML comments break down

HTML Comments Markup

Setting a div's height in HTML with CSS

Wrapping lists into columns

Is a "Confirm Email" input good practice when user changes email address?

<XMP> Tag

HTML version choice

Options for HTML scraping?

How do you disable browser Autocomplete on web form field / input tag?

How do I make a checkbox toggle from clicking on the text label as well?

Html CSS Editor

Wordpress theme development offline tools

How do I give my web sites an icon for iPhone?

In HTML, how to word-break on a dash?

Detecting font in JavaScript

How do you test layout design across multiple browsers/OSs?

How do I print an HTML document from a web service?

Multiple submit buttons on a HTML form

How can I determine a web user's time zone?

Why doesn't the percentage width child in absolutely positioned parent work in IE7?