views:

42

answers:

1

I need to parse a list of bookmarks exported from a browser like Chrome, Firefox and IE. Maybe even google etc.

I played around and did something like this reMatchNoCase("(<h3)(.*?)(</dl>)",myfile1) loop. Then I use reMatchNoCase("(<dt[>])(.*?)(</a>)",i) within the h3/dl tags, and then a lot of cleanup, but its really not reliable.

The thing is that they have categories using h3 tags surrounded by dl tags and then the bookmarks in that. I can't just parse all URLs since I want to get the categories as in the browser.

Thanks.

+3  A: 

if it is XHTML, use XPath

if it is not, it wouldn't be easy. Search http://stackoverflow.com/search?q=parse+html

can you consider using a hybrid approach, parse with jQuery on client side first and post to CF?

Henry
FF3 does not save valid XHTML. Looks like a subset of HTML 3 or thereabouts. Internally labeled a "Netscape Navigator" file.
Ben Doom
maybe this helps? http://java.sun.com/products/jfc/tsc/articles/bookmarks/Bookmarks.java
Henry