views:

112

answers:

6

i have a page,say abc.html, that is having a small form with some fields.

<form name="form" method="post" action="abc.html">.......................</form>

when we submit the form it again comes back to abc.html with some data posted and shows the resulted names on the page which came after processing the posted data.

in the whole procedure the page url remains same.Now i want to parse this abc.html containing data after the submission of the form.I have done parsing in which the original url contains all the data but not like this on which after submission the data gets displayed on the page.Please tell me how can i parse such page??

A: 

Well, to get the correct HTML from the server, you have to send a POST request containing the form data. Then you can parse the server response.

gnud
that i know...but how to send the request through form and then fetching the result using php this i dont know :-(
developer
I am not entirely sure if I understood your question and what you are trying to do, but you know that you can send POST requests via CURL and fetch the response from the server: http://php.net/manual/en/book.curl.php ?
Max
A: 

Parsing the HTML file is same as us seeing it. So the HTML page rendered after posting the data will have some or any HTML element in which the additional text is displayed. When you parse the page chek if this or a container exists if so then read the rest of the data. The HTML page displayed without the posted data will not have this additional or container.

Edit: Look at this question : PHP Screen Scraping and Sessions

Shoban
but how to get to that url which contains that additional data??? url remains same all through the process
developer
A: 

First of all. Your page should be abc.php. Otherwise it will not parse any php.

Second. Here is some code that will help you out (I Hope). Copy/Paste this example and place it in abc.php

<html>
<head></head>
<body>
<?php 
if (isset($_POST['submit'])) {
  echo 'you posted the following value :'.$_POST['foo'];
}
?>
<form name="form" action="abc.php" method="post">
<input type="text" name="foo" value="" />
<input type="submit" name="submit" value="Press Me" />
</form>
</body>
</html>

If this is not the case. And you want to parse HTML like parsing XML you should use the DOMDocument class of PHP

$oDom = new DOMDocument();
$oDom->loadHTML($sHTMLstring);
// or 
$oDom->loadHTMLFile($sFileName);
// now you can walk the dom like
$oDomElement = $oDom->getElementByTagName('form');

http://nl.php.net/manual/en/domdocument.loadhtml.php http://nl.php.net/manual/en/domdocument.loadhtmlfile.php http://nl.php.net/manual/en/domdocument.getelementsbytagname.php

Hope this helps

Robert Cabri
i think you havent understood my problem.....i have to parse abc.html which contains data that gets displayed after a particular form gets submitted.I want to parse that data which comes after submission of form.
developer
well yeah I do not understand. Could you elaborate this more? is abc.html generated? which page should do the parsing? which part has to be parsed? Please give some more detail
Robert Cabri
A: 

Good question, but I think it's not possible with PHP. My company doing that with very advanced tool in C. It just grab any page and send the any form and get rsponse HTML. But You can found maybe some tools. Don't know.

Kamilos
A: 

I think the point here is that you can't just open the URL and read the HTML that comes back. You will have to play the part of the browser in order to interact with the server side form. To do this, you'll have to write your own code to HTTP POST the form input data. The HTTP response to your POST will contain the generated HTML, which you can then parse for the processed results.

Paul McGuire
A: 

If you want to send the form to the web server (i.e. "fill" it first) you need something similar to Perls WWW::Mechanize. See this question for possible solutions to do this. Afterwards, you need to parse the resulting page, and that heavily depends on the site in question itself: one site might use named elements you can easily retrieve using regular expressions, a different site might not, making it much harder to get the values you're interested in.

bluebrother