views:

579

answers:

3

I'm trying to write a simple PHP script which automatically sets up new etherpads (see http://etherpad.com/).

They don't have an API (yet) for creating new pads so I'm trying to figure if I can do things another way.

After playing around some, I found that if you append a random string to etherpad.com to a not-yet-created pad, it'll come back with a form asking if you want to create a new etherpad at that address. If you submit that form, a new pad will be created at that URL.

My thought then was I could just create a PHP script using CURL that would duplicate that form and trick etherpad into creating a new pad at whatever URL I give it. I wrote the script but so far I can't get it working. Can someone tell me what I'm doing wrong?

First, here's the HTML form on the etherpad creation page:

`

<p><tt id="padurl">http://etherpad.com/lsdjfsljfa-fdj-lsdf&lt;/tt&gt;&lt;/p&gt;

<br/>
<p>There is no EtherPad document here. Would you like to create one?</p>

<input type="hidden" value="lsdjfsljfa-fdj-lsdf" name="padId"/>
<input type="submit" value="Create Pad" id="createPad"/>

`

Then here's my code which tries to submit the form using CURL

$ch = curl_init();

//set POST variables
$url = "http://etherpad.com/ep/pad/create?padId=ldjfal-djfa-ldkfjal";
$fields = array(
  'padId'=>urlencode("ldjfal-djfa-ldkfjal"),
);

$useragent="Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30)";  

// set user agent  
curl_setopt($ch, CURLOPT_USERAGENT, $useragent); 

//url-ify the data for the POST
foreach($fields as $key=>$value) { $fields_string .= $key.'='.$value; }
print_r($fields_string);

//open connection
$ch = curl_init();

//set the url, number of POST vars, POST data
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_POST,count($fields));
curl_setopt($ch,CURLOPT_POSTFIELDS,$fields_string);

//execute post
$result = curl_exec($ch);
print_r($result);
//close connection
curl_close($ch);

When I run the script, PHP reports back that everything executed correctly but etherpad doesn't create my pad. Any clues what's going on?

+1  A: 

I have not investigated this specific site but I guess there are some important headers which are missing. Here is a very general approach that is applicable for nearly any website:

Use a network sniffer such as Wireshark to capture all connectons. Then compare the sent POST fields with yours.

An even easier way is to use Netcat. Just save the page to disk, change the form-URL to http://localhost:3333/ and run

$ nc -l -p 3333

Now open the local HTML file and fill in the fields appropriately. Immediately you will see all headers that would have been transmitted to the host.

(There are also extensions for Mozilla Firefox but in general they just slow down the browser without providing much benefit.)

Also read what I have posted on http://stackoverflow.com/questions/1700821/to-auto-fill-a-text-area-using-php-curl/1702025#1702025 as it might help you with your realization in PHP.

By the way, you are sending the parameter "padId" via GET and POST. That is not necessary. Check what the Etherpad-form actually uses and stick with it.

Ah, I forgot to mention that many websites check the user agent. Unfortunately you cannot use my Netcat-approach to find it one but Wireshark will definitely display it. In general I am setting the user agent to http://<host>/. If this does not work in your special case, it is best to sniff the proper value.
+2  A: 

My guess is that you're missing the cookies and/or the referrer. It may be checking the referrer to ensure people aren't creating pads without confirmation.

Wireshark will help, but add that to your curl and see if it works.

Sundeep
Yeah -- that's what it ended up being -- they were checking for a cookie. See my answer.
Kyle Mathews
+1  A: 

Here's the answer a friend helped me come up with:

They're apparently doing some cookie validation, that's why your script isn't working. You can find this out by loading the new pad creation prompt page, clearing your cookies, and then reloading the page. It won't work. Tricky, but effective for most casual bots.

Here's a script that gets around the limitation. Just insert your desired $padId and away you go.

<?php

$padId = 'asdfjklsjfgadslkjflskj';

$ch = curl_init();

# for debugging
curl_setopt($ch, CURLOPT_HEADER, true);

# parse cookies and follow all redirects
curl_setopt($ch, CURLOPT_COOKIEFILE, '/dev/null');
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);

# first, post to get a cookie
curl_setopt($ch, CURLOPT_URL, 'http://etherpad.com/' . urlencode 
($padId));
$result = curl_exec($ch);
echo $result;

# next, post to actually create the etherpad
curl_setopt($ch, CURLOPT_URL, 'http://etherpad.com/ep/pad/create');
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, 'padId=' . urlencode($padId));
$result = curl_exec($ch);
echo $result;

curl_close($ch);
Kyle Mathews