views:

393

answers:

4

I have form (on my own blog/cms install which i want to play with a bit) with hidden value which i want to extract. Problem is that there are 2 forms on that page, each with that hidden field with value. On each form field name is the same, only hidden value differs. Something like this:

<input type="hidden" id="_hiddenname" name="_hiddenname" value="valuehere"/>

Both look the same in html source. So, to help myself i opened php file with this page, edited it and added some random words before field that i need. So now one field (the one that i don't want) is like in above code but field i need is like this:

mywordshere <input type="hidden" id="_hiddenname" name="_hiddenname" value="valuehere"/>

How do i extract value from field i need (with mywordshere before its code) if i have my page's html source in php variable (grabbed with libcurl)?

A: 

The value will be available in either $_GET["_hiddenname"] or $_POST["_hiddenname"], depending on which method you are using. Which one you get will depend on which form is doing the submitting.

If you have two fields which are named the same within the same form, you have a bigger problem.

Zenham
true if the page is submitted, but the poster asked how to identify the field from a page scrape.
dnagirl
+1  A: 

Assumably the two forms have different names, correct? So if you parse your scraped text with something DOM aware, you should be able to choose your input field by searching for it in its parent form.

dnagirl
thanks for suggestion, that may work...
Phil
A: 

An example using DOMDocument

<?php

$html = <<<HTML
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
   "http://www.w3.org/TR/html4/strict.dtd"&gt;
<html>
<body>
  <input type="hidden" id="_hiddenname" name="_hiddenname" value="valuehere">
</body>
</html>
HTML;

$doc = new DOMDocument();
$doc->validateOnParse = true;
$doc->loadHTML( $html );

$node = $doc->getElementById( '_hiddenname' );
echo $node->getAttribute( 'value' );

?>

Note: your HTML string must have a DOCTYPE defined for this to work.

Peter Bailey
A: 

The fact that you have two input fields named the same, and with the same id, is the real problem. The id attribute for HTML elements is supposed to be unique on a given page, and if it was, you could do this easily with a DOM parser. Example:

$dom = new domDocument;
$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$inputs = $dom->getElementsByTagName('input');
foreach ($inputs as $i)
{
    if ($i->getAttribute('id') == 'targetId') {
        //do some stuff
    }
}

Since you can't take that approach, and you've marked your input with a string that you can identify, I would use a combination of string functions:

$str = 'mywordshere <input type="hidden" id="_hiddenname" name="_hiddenname" value="valuehere"/>';
$pos = strpos($str,'mywordshere');
if ($pos !== false) {
    $valuePos = strpos($str,'value=',$pos);
    if ($valuePos !== false) {
     //get text starting from the 'value=' portion of the string
     $str = substr($str,$valuePos);
     $arr = explode('"',$str);
     //value will be in $arr[1]
     echo $arr[1];
    }
}

I would strongly recommend you re-work your element IDs however, and use the DOM approach.

zombat
i can't rework it, i'm using wordpress, it's not my custom cms or anything like that
Phil
is dom usually on or php installs or i need to enable it?
Phil
It's part of the PHP core, but it depends on your installation. If it's not enabled, you need to make sure you have php5-xml installed (for rpm-based packages), or recompile with the --enable-dom flag. You can tell if it's installed by checking the output of `phpinfo()`, or `get_loaded_extensions()`, or running `php -m` from the command line.
zombat