views:

30

answers:

1

We have a registration form where people can sign up to take surveys for a small compensation. Recently we found a lot of suspect entries. I tracked down a site in Chinese that I translated via google and it was basically a "how to" to sign up for these sorts of sites. I've been working to track down a way to automatically filter off the bogus ones since.

The registration has a "captcha" to hopefully block non-humans, but the data being entered is extremely realistic in many cases. The survey is for bartenders and all the fields are filled out using legitimate locations and addresses. The phone numbers may be off, but they could be using a cell and moved into the area. I've been trying to screen by capturing the IP info and country data using the following function:

// this function is necessary since allow_url_fopen is disabled by default in php.ini in PHP >5.
function my_file_get_contents($file_path) {
    $ch = curl_init();
    curl_setopt ($ch, CURLOPT_URL, $file_path);
    curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, 1);
    $buffer = curl_exec($ch);
    curl_close($ch);
    return $buffer;  
}

function getInfoFromIP(){

// get correct IP in case of a proxy
if (!empty($_SERVER['HTTP_CLIENT_IP'])){                   // shared ip
    $real_ip=$_SERVER['HTTP_CLIENT_IP'];
}
elseif (!empty($_SERVER['HTTP_X_FORWARDED_FOR'])){       // ip is from proxy
    $real_ip=$_SERVER['HTTP_X_FORWARDED_FOR'];
}
else{
    $real_ip=$_SERVER['REMOTE_ADDR'];
}

//verify the IP address for the
ip2long($real_ip)== -1 || ip2long($real_ip) === false ? trigger_error("Invalid IP Passed: ", E_USER_ERROR) : "";

$ipDetailArray=array(); //initialize a blank array
$ipDetailArray['ip'] = $real_ip; //assign ip number to the array

//get the XML result from hostip.info using custom lookup function
$xml = my_file_get_contents("http://api.hostip.info/?ip=".$real_ip);

//regex to get the country name from <countryName>INFO</countryName>
preg_match("@<countryName>(.*?)</countryName>@si",$xml,$countryInfoArray);
$ipDetailArray['country'] = $countryInfoArray[1];    //assign country name to the array

//get the country name inside the node <countryName> and </countryName>
preg_match("@<countryAbbrev>(.*?)</countryAbbrev>@si",$xml,$ccInfoArray);
$ipDetailArray['country_code'] = $ccInfoArray[1];     //assign country code to array

//return the array containing ip, country and country code
return $ipDetailArray; 
}

Then I've been manually checking and removing ones that show up outside of the US (which is where the bar and survey takers must be located to participate). I'm STILL finding lots of suspect ones that are listed with US based IPs (which I'm sure are spoofed).

Not sure if my code is incomplete or if there's a better solution I haven't been able to find. Thanks

A: 

Don, we do something rather similar, here's some of the things we've had to resort to:

  1. Isolate the page as its own virtual server. Use Apache to block repeat offenders.
  2. Good use of Capcha, but if they are getting past it, you've got a problem. Consider improving the capcha using things that no robot could possibly bypass such as obfuscated graphics or human-challenge questions. If it continues, then you've got some determined people on your hands.
  3. Periodically change the pagename. It may block people who were following a "how-to" link
  4. Insert Google Analytics and watch traffic. It can help you spot patterns and times of day when issues are apparent. Sometimes, it can lead to even more interesting work-arounds.
  5. Scrutinize logs. Check IP addresses using an online tool. Report offenders to ISP's.

Perhaps check if they support browser geolocation, and have a go at that. (http://www.browsergeolocation.com/) Blocking by location is tough though, because so many hackers have other zombie computers at their disposal and information such as area codes is so portable these days.

bpeterson76
I'm already using the recaptcha script to verify it's a human. I think they ARE real requests, it's just that they're not a valid market for who we're soliciting to take the paid survey. I also use a script to look for Geo location of IP, but many say US when I'm 99% sure they're not based on scrutinizing what's entered. I'm not well versed enough to know how they're spoofing the IP though to know if there's something there I could trap. I appreciate the insite though. Maybe I need a better geolocation script??? Also, changing the page location isn't an options b/c we market URL via postcards
Don