views:

26

answers:

1

I would like to get the text representation of a website in a human-readable form, for example hyperlink locations or input fields.
Is there any library that does this? (I've checked Jericho Renderer but it does not show input fields)
For example

<div>
<form action="example.php">
Name:
<input type="text" name="name_field">
<input type="button" value="OK">
</form>
</div>

to something like this

Name: [________] [OK]
A: 

Try tag soup and build it yourself. You get a DOM model of the HTML and can spit out the text.

xcut