tags:

views:

168

answers:

1

Ive gotten some great help here and I am so close to solving my problem that I can taste it. But I seem to be stuck.

I need to scrape a simple form from a local webserver and only return the lines that match a users local email (i.e. onemyndseye@localhost). simplehtmldom makes easy work of extracting the correct form element:

foreach($html->find('form[action*="delete"]') as $form) echo $form;

Returns:

<form action="/delete" method="post">
    <input type="checkbox" id="D1" name="D1" /><a href="http://www.linux.com/rss/feeds.php"&gt;
        http://www.linux.com/rss/feeds.php
    </a> [email: 
        onemyndseye@localhost (Default)
    ]<br />         
    <input type="checkbox" id="D2" name="D2" /><a href="http://www.ubuntu.com/rss.xml"&gt;
        http://www.ubuntu.com/rss.xml
    </a> [email: 
        onemyndseye@localhost (Default)
    ]<br />         
<input type="submit" name="delete_submit" value="Delete Selected" /></form>

However I am having trouble making the next step. Which is returning lines that contain 'onemyndseye@localhost' and removing it so that only the following is returned:

<input type="checkbox" id="D1" name="D1" /><a href="http://www.linux.com/rss/feeds.php"&gt;http://www.linux.com/rss/feeds.php&lt;/a&gt; <br />
<input type="checkbox" id="D2" name="D2" /><a href="http://www.ubuntu.com/rss.xml"&gt;http://www.ubuntu.com/rss.xml&lt;/a&gt; <br />

Thanks to the wonderful users of this site Ive gotten this far and can even return just the links but I am having trouble getting the rest... Its important that the complete <input> tags are returned EXACTLY as shown above as the id and name values will need to be passed back to the original form in post data later on.

Thanks in advance!

***** EDIT ******

Issue close to solved now thanks to Yacoby. The last small hurdle is that some trash is left behind from the str_ireplace. Perhaps it would be easier to remove all text between </a> and <br /> ...?

After Yacoby's additions the output is as follows:

<form action="/delete" method="post">
    <input type="checkbox" id="D1" name="D1" /><a href="http://www.linux.com/rss/feeds.php"&gt;
        http://www.linux.com/rss/feeds.php
    </a> [email: 
         (Default)
    ]<br />         
    <input type="checkbox" id="D2" name="D2" /><a href="http://www.ubuntu.com/rss.xml"&gt;
        http://www.ubuntu.com/rss.xml
    </a> [email: 
         (Default)
    ]<br />         
    <input type="checkbox" id="D3" name="D3" /><a href="http://mythbuntu.org/rss.xml"&gt;
        http://mythbuntu.org/rss.xml
    </a> [email: 

    ]<br />         
<input type="submit" name="delete_submit" value="Delete Selected" /></form>

Notice [email: (Default)] and [email: ] have been left behind. Also would need to remove the form action and submit lines at last but that part I think i can gather from the previous suggestion.

***** SOLVED ****

issue solved with:

$html = file_get_html('http://localhost:9000/');
foreach($html->find('form[action*="delete"]') as $form)
  if ( stripos($form->innertext, 'onemyndseye@localhost') !== false ){
      $form = preg_replace('!</a>.*?<br />!s', '</a><br />', $form);
      echo $form;
}

Thanks for the help!

A: 

Maybe something like

if ( stripos($form->innertext, 'onemyndseye@localhost') !== false ){
    $form->innertext = str_ireplace('onemyndseye@localhost', '', $form->innertext);
    echo $form;
}

This won't work with html like

<b>onemyndseye</b>@localhost

As it is easy to find if the text with tags removed matches a string using plaintext but it is far harder to replace.

Yacoby
hrmm... Added that to the foreach and matching onemyndseye@localhost seems to always return False :/ Gives me afew ideas to try but nothing solid yet. Thanks for the reply!!!!Out of curiosity I reversed the true/false and 'interhtml=""' was added to form element.
onemyndseye
@onemyndseye I made a mistake in the code. It should now work. (I used `innerhtml` rather than `innertext`
Yacoby
WOW! Thanks that works nicely.. I have edited the original post to reflect your suggestion and show the last needed tweaks
onemyndseye
Solved.. Thanks for the help Yacoby!
onemyndseye