views:

49

answers:

3

Hi there,

I'm using PHP to data scrape another website. However, on certain occasions I need to confirm a variable (due to have two very alike possibilities).

The button I'm supposed to click to confirm my variable is:

<input type="submit" class="buttonEmphasized confirm_nl" name="start" value="Bevestig"  accesskey="s" />

However, adding &start=Bevestig to the url doesn't seem to solve the problem, and I'm receiving the same page. What's more, is that the website is using sessions and every http_post_data seems to be starting a new session.

Is there a way to let PHP "click" a button if a certain output is missing?

This is a train time table data scraping system (using the HAFAS system).

Cheers

+1  A: 

there is no generalized solution for this problem. every site is different in some way. your best bet is to analyze http message being sent by the original page. you can do it with firefox+firebug+live http headers for example. this way you're going to see all the parameters(required or not) and then replicate this message with your script.

it might(will, most likely) require faking session/cookie data. you might need to use curl for that.

kgb
Tuinslak
@Tuinslak use firebug. open Net tab, select the first query and you will see POST tab there. the data might not only be in the url, something has to be in post/cookie/session
kgb
I see some cookie information; http://yeri.be/cd -- so I take it if I do a http_post_data to the URL and add the cookie content in $data it should work?
Tuinslak
i don't know what data you are talking about). you should add the cookie data as a part of the http header... using curl for example. http_post_data will probably not work, as it doesn't send cookies, only a post request.
kgb
All right, thanks
Tuinslak
A: 

"Is there a way to let PHP "click" a button if a certain output is missing?"

Nop, PHP is server-sided. Use Javascript

Hal
yes, there is) you have to get form's action and send a http request to the provided path and get the response thus emulating what browser does.
kgb
hmm never thought of that, thanks for the headsup
Hal
A: 

If the post seems to be starting a new session, I would suspect that you are not respecting the cookies that were provided by the other side.

You need to send the session cookies back in the POST request.

That's also where you should be sending your start field. While many pages will accept parameters in URL or posted, they are not equivalent concepts.

Oddthinking
I've managed to find an URL I can copy paste over multiple browsers. So it's not session/cookie related apparently.
Tuinslak