tags:

views:

46

answers:

2

Hello, I am trying to extract specific content(links, text, images) from an HTML page. Is there some program out there that I can use to produce a visual representation of the DOM model of the page? I know I could I write such a program in Java using an HTML parser, but before I do that, I thought I would see if there already exists such a program.

My main objective is to extract certain links, image URLs, and text; and send these to a Flex applet on the page. Thanks, Vance

A: 

I think your best bet would be jQuery and GreaseMonkey... GreaseMonkey would insert the script, and jQuery can efficiently parse the HTML DOM. Note that this is possibly FireFox only solution, since I THINK GreaseMonkey is a FireFox only utility.

Michael Bray
GreaseMonkey is Friefox only … but the OP was trying to avoid writing his own software for this, and you solution just provides some libraries that a custom program could use.
David Dorward
That's not how I took it at all, given that he wanted to send them to another program... To me that implies that he wanted some code to be able to do it. But I guess considering that he accepted the FireBug extension (which is FF only also) that I was mistaken.
Michael Bray
A: 

If you just want to extract a few bits of information (rather than print out the entire page structure say) the you can use the FireBug extension for Firefox.

Choose the HTML tab then click on the second icon from the left (looks like a cursor pointing at a box) then click on the part of the page you're interested in to go to that part of the DOM.

Dan
Thank-you!! This is exactly what I wanted! I thought this type of program must exist, but I didn't know what it would be called.
JavaMan