views:

871

answers:

4

I want to write a program that analyzes your fantasy baseball team and notifies you of recommended actions, possibly multiple times per day. The problem is, you aren't playing fantasy baseball on my site, you're playing on yahoo, or cbs, or espn, etc.

On the majority of these sites, fantasy teams and leagues are not public, so you must be logged in and a member of the league to see the teams in the league.

All that I need is the plain html for the team page on each of those sites to be sent to my server, where I can then parse and analyze the file and send user notifications.

The problem is that I need username/password combinations to easily get this data to my server when I need it, and I think there will be a lot of people who wouldn't want to entrust their yahoo/espn/cbs password to me.

I have come up with several possible ways to solve this problem:

  1. The most obvious way is to ask for their credentials for the site on which their team is hosted. Then I could just programmatically log in and request the data I need. I'm guessing a number of people would be comfortable giving me their credentials, and a number of them not so much.

  2. Write a desktop client, which the user then downloads. The client would require their credentials, but it could then basically do exactly the same thing that the server based version would do, log in, request the page, and send the page back to my server. The difference being that their password would never need to leave their desktop. Their computer would need to be on, and this program running for this method to work.

  3. Write browser add-ons that navigate to the page I need, use the cookie that is saved from a previous login to login to the site, and send the page back to my server. This doesn't require my software to ever ask for their password, but if the cookie expires I am hosed, and I don't know much about browser add-ons besides.

I'm sure there are other options, but these are what I've come up with so far.

I have two questions: 1. What are the other possibilities for this type of task? 2. Am I over-estimating people's reluctance to give me their yahoo (for example) password? Is option (1) above the obvious choice?

It was suggested in the comments that I try yahoo pipes, and that looked like a promising suggestion so I explored it a bit. Having looked now at this, I don't think that is an option. So, it looks like I'll be going with option 1.

+1  A: 

Option 1 is the obvious choice. People who trust your site will provide the details. There is no other way you can login to other site while screen scraping.

Bhushan
I think you'll be fine with 1. You're asking for the password for a Fantasy Baseball team, not a Bank Account so I expect most people would be happy to hand that over.
Dave Webb
That's true, but if the site were, for example, yahoo, you might also be handing over your email/groups/other credentials since they could all be linked to the same account.
Zxaos
Zxaos, that's where my skittishness would come from, as the user of this product. I do a number of things on yahoo, including email, that I probably wouldn't want to hand over for an edge in fantasy baseball. For espn or cbs, I probably wouldn't have the same concern, at least not to the same degree.
Brad
+2  A: 

A potentially more complicated answer could possibly be done with (for example) yahoo pipes.

Hypothetically, you create a pipe which prompts the user for their credentials and provides them with a url which contains their scraped data. They enter this URL in their site, and never have to provide their credentials directly. Even better, for the security-conscious, it would be possible to examine what the pipe was actually doing before entering any information.

The downside would be increased complexity (as well as you'd have to write and maintain the pipe). Having said that, you could provide a link directly to the published pipe from your site, to make things as easy as possible.

Zxaos
I am looking into yahoo pipes, thanks for the heads up on that. I don't know whether that will work or not, but if nothing else it's an interesting looking project by yahoo.
Brad
+1  A: 

Hi Brad,

This is a problem I grappled with a couple of years ago when I wanted to do the same thing. Our site is http://benchcoach.com and the options we were considering were the following:

Original we considered getting the user's credentials and login. We would then log in and scrape their league and team info. The problem there is that after reading several of the various terms of service, this would definitely be violating the terms of service. On top of this, Yahoo! was definitely one of the sites we were considering and their users have email (where we could get access to sensitive data), and Yahoo! wallet. In addition, it would be pretty trivial for Yahoo/ESPN/CBS to block our programmatic logins by IP Address.

The solution we settled on (not 100% happy but it does seem to work) was asking our users to install a bookmarklet (like delicious, digg or reddit) which would post the current html page to our servers, where we could parse the data and load our database. If they were still logged into their Yahoo/ESPN/CBS account, we would direct them directly to the pages, otherwise, those sites would prompt for authentication. Clicking the bookmarklet once more, would post the page to our servers.

The pros of this approach was that we never collected anyone's credentials so any concern of security would have been alleviated. Secondly, it would make it impossible for Yahoo/ESPN/CBS to block access to our service since we would never be connecting directly to their servers but rather the user's browser would be posting the contents of their browser to our server.

The problems with this is that it takes 2 clicks to post a page to our site. For head to head leagues, we needed 3-4 pages so it would take our user 6-8 clicks to sync their league to our servers. We're still looking at options for this.

One important note is that I ran into the product manager of the Yahoo Fantasy Football site at a conference a year ago. We talked about how we were getting the Yahoo data, and he confirmed that getting credentials would violate their TOS and they may stop us. While I don't think they would have, it would have made it hard to invest time and energy to develop this only to have them block our site and pissing of users by closing their accounts.

In any case, if you have any questions or want to collaborate on this, let me know and i'll be willing to discuss our approach and technology. you can contact me at spark at benchcoach dot com

hope this helps.

sparky
Thanks for the reply. I agree that that seems like a good compromise solution, but for what I was hoping to do I really needed real-time access... I was able to have them login, and then keep the cookie until it expired (and thus dispose of their credentials), but after that another login was required. As for their TOS, there's an iPhone app for managing yahoo fantasy teams that keeps credentials, but even that has now broken for me. No easy solution I guess.
Brad
A: 

This is an ideal job for screen-scraper. Any edition can be made to log into a site, and will maintain the session for you. When you want to send to your client to run, you will need to have them install the software, and deliver your scripts to run this site.

For the 3rd step, the site you bookmark to might prove difficult to link to if there are session/cookie issues, but some solution is possible, eg a cached version of the page or something.

Jason Bellows