views:

123

answers:

5

Several years ago I developed a website for a wholesale company that wanted to keep its online catalog (and wholesale pricing) private. Short of manually reviewing each submitted application, I was really stuck on how to accurately accomplish this.

This has also come up in other web projects requiring registration to access certain "slightly secret" information - including members of the press, and dealers. With most of these applications receiving very little in terms of actual submissions, it was easy to dismiss the task of automatically validating the form as impossible (or too much effort).

Lately however, there has been a very large increase in registrations, and it would make sense to automate this process if at all possible.

The site is developed in PHP, and I have tried the following:

  • Scraping Dunn and Bradstreet against business phone
  • Scrapping Yellow pages against phone/address
  • Basic Regex

The regex is just standard, any-user validation. The scraping was unreliable (and questionably following the terms & conditions)

How do other developers deal with the issue of business/press validation, and how do they justify this to their clients?

Thanks loads

A: 

How about having them populate the name of their organization and use a web service to verify. I found this http://www.business.gov/about/features/api/business-licenses/, its only for US based companies but I would suspect their are others like it for other countries.

Pete
This appears to only be valid for US government agencies - not for all registered businesses. I'm starting to think a (complete) reverse EIN database does not exist.
Mahdi.Montgomery
+1  A: 

Another way is to ask them a mail address from their domain name, and send them an email. It does not work for each company, but you can use this tip to reduce the number of fraud attempts.

In order to be more efficient, you can also check the whois : http://www.webservicex.net/whois.asmx

If you're paranoid, you can also try to establish a connection with their server and follow this tutorial : http://www.coveryourasp.com/ValidateEmail.asp

Rob
This is an interesting concept, but many registrants are still stuck on "[email protected]" types of accounts. This could be a nice addition to my checks though. Maybe I need a point system for validation..
Mahdi.Montgomery
Yes, but it's still possible to check if [email protected] exists on hotmail's server, isn't it ? Use the link, luke :p
Rob
A: 

Why not crowd source the solution?

This might sound weird, but bear with me. They all have something in common, they are all people of the same or similar industry. So for somebody to register and be approved, they need to be "vouched" for by somebody with preexisting or preapproved access. Since it is likely that reporters know other reporters, and wholesalers know other wholesalers all a person needs to do is register and request somebody to "vouch" for their legitimacy.

The exact implementation of course if up to you, you could set it so that any person needs at least 2 different people to vouch for them before their account is approved. The system will require some initial manual work, because you'll need to manually approve a few people, but as more people register the more they'll probably know somebody who already has access. Over time the system becomes more and more self-sufficient.

The only issue I see is if the number of users are small, and people who register don't know other people on the site.

theAlexPoon
But for a small pool of applicants human verification is the best way to go, until they reach a point where the number of applications per-day approaches the maximum work capacity of the pool of verifiers. Use the verified applications to develop the heuristics for the automated system, introduce the automated system while the number of applicants per day is relatively small, so that its results can be verified and its heuristics adapted appropriately. And, of course, *always* have a QA check on the rejected applications (or a subset thereof).
David Thomas
A: 

Prefaced with: I'm not a lawyer nor e-commerce expert.

If this is an international deal, there is no universal standard to check against. Furthermore, you shun businesses whose licenses are pending or independent owners not interested in the registration bureaucracy. Automation only works if a universal standard or single datasource pool for whatever it is you're automating is in place.

Lacking that, you need an EULA to explain your business policy and to hold the user liable for fraudulent submissions. In lieu of the EULA, you may (depending on the jurisdiction and local laws) have the client complete an affidavit of some sort and require them to fax or scan/email it back to you before account activation. Follow up all submissions with a phone call to their business phone number as given to their local town council, BBB, commerce agency, or other local government, public service, or private organizational registrar just to confirm registration. The contact phone number, name, or email address they submit during registration on your site should only be something you double-check against, but never as a definitive contact resource.

Furthermore, additional verification can be done by requiring a credit card purchase of a general service fee of the absolute minimum the card requires (typically $1 it seems). Though never full-proof, this credit card transaction serves only to ID the client. See Craigslist, USPS.

I think I just answered your question with a "Ask a lawyer experienced in the field".

bob-the-destroyer
But it seems like an interesting web-service to establish, doesn't it? Create your own 'universal business checking engine' =)
David Thomas
@David Thomas: It all depends on the niche you're covering (epa certs, ohs, sarbanes oxley, or whatever classification and certification is in season). Without knowing anything about OP's user base, I'd say just find the common denominator, and if there's just one or two global agencies servicing them all you could query. Otherwise, there is generally nothing universal about locality.
bob-the-destroyer
@bob, absolutely. But my thought was (and I don't have the cojones to *implement* it) that if you take a small registration fee to verify business credentials (say one country-market only in the short term, using whatever means you find appropriate) and then provide an API for other businesses to check against...It sounds interesting, but a huge headache.
David Thomas
@David Thomas: you're talking a full global business registration service, far beyond the scope of a single private business just trying to do it's... business. What you suggest is always helpful and has been attempted many times over. It's just not realistically achievable on the global marketplace, nor something OP should get involved in _unless_ that exact service is the product he's _completely_ dedicated to selling, being liable for, and willing to take a total loss on (IMHO). As for crowdsourcing: a mob is just as trustworthy as an anonymous individual. Take 4chan for instance...
bob-the-destroyer
@bob, I was only suggesting that it sounded an interesting project, but certainly not one I'd recommend any sane person attempt. As for the crowd-source thing, yeah. I'd only suggest using a team if that team were directly employed by the business undertaking the project. As much as I'd *like* to trust crowd-sourcing (Wikipedia, for all its flaws, is a good model) it does have its down-sides (4chan and anonymous being the one of the most recognisable). I wasn't aware that it had already been attempted, though.
David Thomas
@David Thomas: I'd argue the entire internet is crowdsourcing. For specific examples, see Facebook, Reddit, Fark, Sensible Erection, Yelp, Google, Yahoo, or any other public blog or website using user feedback or user generated content as its own content.
bob-the-destroyer
@David Thomas: Google might one day come up with what you are suggesting. They've got all the scope they need, except in China.
Aditya Menon
@Aditya, I'm sort of surprised that they've not already attempted it, to be honest. Though I suppose one of the trusted [Certificate authorities](http://en.wikipedia.org/wiki/Certificate_authority) might be better placed, given their already-explicit 'trustedness.'
David Thomas
^^ After Buzz and a zillion other failed ideas, maybe Google _finally_ decided to make sure they'd succeed before they jump into anything.
Aditya Menon
+3  A: 

Once you get past the human vs. bot screening, you're looking for some way of distinguishing the merely curious non-target visitor (say, the proverbial 14-year old kid) from those you want to let in. As the other commenters say, there's no universal Turing machine way to identify, let alone evaluate, a purported reporter or business person who might be interested in your site.

One thing you might consider is posing the question "please briefly describe your interest in [the site] and the specific aspects of our products or services that interest you." Then develop some experience based heuristics for automated screening. First, run it through a spam filter, next score for keywords, etc.

Richard Careaga
Thanks Richard. Ultimately, this is the way I ended up going. I validated across several fields, with a final max possible score of 50 points, each successful (or partially successful) field added to the score. If the score was >= 40 points, I gave them instant access. The rest could then wait for human validation.
Mahdi.Montgomery