ansaurus

Question

Get HTML page <input> values and names using regex on PHP

Answer 1

+4 A:

If you think I shouldn't use regex, but something like xpath, say how.

That would be something like

<?php
$doc = new DOMDocument;
if ( !$doc->loadhtml($contents) ) {
  echo 'something went wrong';
}
else {
  $xpath = new DOMXpath($doc);
  foreach($xpath->query('//form[@name="aspnetForm"]//input') as $eInput) {
      echo 'name=', $eInput->getAttribute('name'), ' value=', $eInput->getAttribute('value'), "\n";
  }
}

If you get annoying warning messages you might want to use @$doc->loadhtml($contents); maybe in conjuction with libxml_use_internal_errors() and libxml_get_errors()

VolkerK 2009-07-08 21:19:57

Answer 2

+2 A:

How about this --> http://simplehtmldom.sourceforge.net/

*  A HTML DOM parser written in PHP5+ let you manipulate HTML in a very easy way!
* Require PHP 5+.
* Supports invalid HTML.
* Find tags on an HTML page with selectors just like jQuery.
* Extract contents from HTML in a single line.

// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');

// Find all images
foreach($html->find('img') as $element)
       echo $element->src . '<br>';

// Find all links
foreach($html->find('a') as $element)
       echo $element->href . '<br>';

Good luck.

Wbdvlpr 2009-07-08 21:22:38

[quote]Good luck.[/quote]Thanks, I'll need it. :)

Pedro Cunha 2009-07-08 21:32:04

Answer 3

+1 A:

OK. Since you asked: You should not try to parse non-regular languages with regular expressions. A simple heuristic is: if the language seems "nested", it is not regular.

One simple way might be something along the following lines:

$htmldoc = new DOMDocument;
$htmldoc->loadHTMLFile("local_secure.html");
$forms = $htmldoc->getElementsByTagName("form");
$inputs = $forms->item(0)->getElementsByTagName("input");

foreach ($inputs as $input)
  { do_something_with($input->getAttribute("name"));
    do_something_with($input->getAttribute("value")); };

Add error checks to your liking. Further documentation: http://www.php.net/book.dom

Svante 2009-07-08 21:33:14

ansaurus

tags:

views:

answers:

Get HTML page <input> values and names using regex on PHP

related questions