views:

283

answers:

3

A new feature I wish to add to our local network is the ability to retrieve email from free email services such as Gmail, Yahoo and Hotmail using PHP. There are services we can pay for but I would rather hack it up myself!

I find that Google only has an API but the rest do not. What are the problems associated then with me just retrieving email using CURL?

I have even implemented the GMail part using CURL and PHP.

A: 

I assume you have a reason for not using the pop protocol which is the supported standard way to retrieve email. To do it like you want it is something that is not supported and maybe also not be covered by the terms of use of the providers.

But if no captcha solving gets in your way it is technically possible. You will have to write a different application for each provider. In case they change something you will have to adopt your application.

To make it work with curl be sure to collect all the cookies they give you in all the pages and to return them in every request.

In case of any problems (and also for development) you could analyze the http requests and answers with some tool (e.g. proxomitron on windows) and make the curl requests more and more look exactly like the browser requests until you succeed. In the end there is nothing they can do to distinguish your curl requests from human requests through a browser. Except captcha like I said before.

Another thing is the intervals between your requests, you could get blocked for requesting to often or when there is no pause between 2 requests (which a human cannot do). Try inserting randomly modified pauses between requests if you suspect this.

I can imagine they block your accounts or IPs during development, in this case it would be necessary to change the IP and/or the account you work on.

Is it not allowed to use CURL to retrieve email from these service providers?? Or any form of email retrieval for that matter?
Abs
I'd say that depends on their terms of use and their business model. And spammers also use bots to register accounts and send mails, so they possibly fight it, even if you are a good guy.
+2  A: 

It almost certainly violates their terms of service to screen-scrape their websites for that purpose. If they redesign your site, the scripts you're using to parse out the e-mail contents etc. will probably break catastrophically, as well.

Yahoo, Gmail, and Hotmail all support POP3, a standard protocol for retrieving e-mails. Why not use that instead?

ceejayoz
+1  A: 

When someone gives you an API, they're promising you that "if you run code X, Y will happen. When you screen scrape, there's no such promise from the provider, and many providers have items in their terms of service that explicitly forbid screen scraping. From a technical standpoint, this means their page/application may undergo changes that will break your screen scraping, wither accidently or purposefully by the provider. This is why CAPTCHA's exist.

Also, increasingly, these applications are using more and more "AJAX" style architectures, which means you're committing yourself to reverse engineering how their application works, as well as keeping up with the changes each application makes.

Finally, well, you're doing it wrong. Email is a set of protocols in and of itself. Most providers have a way to access email via POP3 and IMAP. I'd look into hacking PHP code to interact with the POP/IMAP servers which, like an API, are a promised set of behaviors. You also have the advantage that code written for one provider will likely work (with minor tweaks) for another.

Alan Storm
Ah, Thank you for the explanation. These were things I was afraid of and you have provided a solution.
Abs