views:

40

answers:

2

How would you best get all the input fields (text, radiobutton, checkbox, select etc) semi-automatically out of dodgy formatted html documents?

Trying to get the TYPE,NAME,VALUE and OPTION for SELECT.

I am currently using Xpath (in PHP) because everyone here says 'use that instead' but I'm getting nowhere with it. So I am open to suggestions. I have a shell present, so it may be 'ordinary' grep too.

Thanks. Matt

A: 

You could use the jQuery framework in conjunction with good old Firebug (a Firefox addon). jQuery's selector engine will make it easy for you to find all instances of form elements. And Firebug will happily log it to the Javascript console of Firefox.

As you said: this is SEMI-AUTOMATIC.

EDIT

To get you started you may want to take a look at jQuery's API (it has a handy serialize method that aids you big time).

aefxx
Thanks, its a cool idea, but my preference would (obviously) be somewhat more automated (ie in a commandline or (php) script).
Matt
No problem, you could send your result via AJAX to a script and ... you know. Once again, jQuery will be your friend here.
aefxx
A: 

See this for an 'almost-there' solution:
http://stackoverflow.com/questions/1100123/get-html-page-input-values-and-names-using-regex-on-php

Thanks!

Matt