views:

263

answers:

2

I'm trying to write a tool to check if a proxy server is up and available for use. So far, I've come up with two methods in the class below (I've removed setters and getters that are superfluous to this question).

The first method uses cURL and tries to request a page via the proxy, the second tool uses fsockopen and just tries to open a connection to the proxy.

class ProxyList {
    /**
     * You could set this to localhost, depending on your environment
     * @var string The URL that the proxy validation method will use to check proxies agains
     * @see ProxyList::validate()
     */
    const VALIDATION_URL = "http://m.www.yahoo.com/robots.txt";
    const TIMEOUT        = 3;

    private static $valid = array(); // Checked and valid proxies
    private $proxies      = array(); // An array of proxies to check

    public function validate($useCache=true) {
        $mh       = curl_multi_init();
        $ch       = null;
        $handles  = array();
        $delay    = count($this->proxies) * 10000;
        $running  = null;
        $proxies  = array();
        $response = null;

        foreach ( $this->proxies as $p ) {
            // Using the cache and the proxy already exists?  Skip the rest of this crap
            if ( $useCache && !empty(self::$valid[$p]) ) {
                $proxies[] = $p;
                continue;
            }

            $ch = curl_init();
            curl_setopt($ch, CURLOPT_HTTP_VERSION,    CURL_HTTP_VERSION_1_1);
            curl_setopt($ch, CURLOPT_URL,             self::VALIDATION_URL);
            curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, true);
            curl_setopt($ch, CURLOPT_PROXY,           $p);
            curl_setopt($ch, CURLOPT_NOBODY,          true); // Also sets request method to HEAD
            curl_setopt($ch, CURLOPT_HEADER,          false);
            curl_setopt($ch, CURLOPT_FOLLOWLOCATION,  true);
            curl_setopt($ch, CURLOPT_TIMEOUT,         self::TIMEOUT);

            curl_multi_add_handle($mh, $ch);
            $handles[$p] = $ch;
        }

        // Execute the multi-handle
        do {
            curl_multi_exec($mh, $running);
            usleep($delay);
        } while ( $running );

        // Get the results of the requests
        foreach ( $handles as $proxy => $ch ) {
            $status = (int)curl_getinfo($ch, CURLINFO_HTTP_CODE);

            // Great success
            if ( $status >= 200 && $status < 300 ) {
                self::$valid[$proxy] = true;
                $proxies[] = $proxy;
            }
            else {
                self::$valid[$proxy] = false;
            }

            // Cleanup individual handle
            curl_multi_remove_handle($mh, $ch);
        }

        // Cleanup multiple handle
        curl_multi_close($mh);

        return $this->proxies = $proxies;
    }

    public function validate2($useCache=true) {
        $proxies = array();

        foreach ( $this->proxies as $proxy ) {
            // Using the cache and the proxy already exists?  Skip the rest of this crap
            if ( $useCache && !empty(self::$valid[$proxy]) ) {
                $proxies[] = $proxy;
                continue;
            }

            list($host, $post) = explode(":", $proxy);

            if ( $conn = @fsockopen($host, $post, $errno, $error, self::TIMEOUT) ) {
                self::$valid[$proxy] = true;
                $proxies[] = $proxy;
                fclose($conn);
            } else {
                self::$valid[$proxy] = false;
            }
        }

        return $this->proxies = $proxies;
    }
}

So far, I prefer the cURL method since it allows me to check large batches of proxies in parallel, which is wicked fast, instead of one at a time like fsockopen.

I haven't done much work with proxies, so it's hard for me to tell if either of these methods are sufficient for validating that the proxy is available, or if there is a better method that I am missing.

+1  A: 

Hm. Trying to establish a connection to a safe (most probably available) URL through the proxy, and checking for errors, sounds o.k. to me.

For absolutely maximum security, you maybe want to add another call to another validation URL (e.g. something at Google), or make it two calls, just in case.

Pekka
A second availability check sounds like a good idea, but the more requests that are made, the more performance becomes a concern.
Justin Johnson
True. It depends on the intended use, I guess.
Pekka
+1  A: 

cURL is the preferred way, because of the multi_exec.

I wouldn't bother doing two check, but do the google (or a Proxyjudge) call immediately. Proxies sometimes can allow sockets, but just wont fetch a thing: therefore your cURL method would be secure and not that slow.

As Pekka above mentions: it depends on the intended use.

Did you use Charon and harvested a load of proxies, I would want them checked against a proxyjudge and I would like to know the turnaround time(to avoid slow proxies) and anonimity.

If you want to use it as a monitoring system for corporate proxies, I would just want to make sure it can fetch a page.

a (chaotic) Example of checking a proxy via fetching an URL with cURL.

TLDR: use the cURL, it can handle parallel requests and is the most stable without being to slow (by not doing the doublecheck). http://www.oooff.com/php-affiliate-seo-blog/php-automation-coding/easy-php-proxy-checker-writing-tutorial/

Deefjuh