pffft ...
I've had a need to do similar things with feed agregation and building rss feeds from web page content on different domains.
User Gets app1 page, fills in details and submits then on the server for app1 I have a method that looks like this ...
HTMLDocument FetchURL( string url )
{
WebClient wc = new WebClient();
string remoteContent = wc.DownloadString(url);
// mshtml api is very weird but lets just say you have to do things this way ...
HtmlDocument doc = new HTMLDocument();
IHTMLDocument2 doc2 = (IHTMLDocument2)doc;
doc2.write(new object[] { remoteContent });
return (HTMLDocument)doc2;
}
This function does 2 things of use ...
- It gets the page of content at "url"
- It parses that content in to a HTMLDocument object
Once you have this function you can then call it passing it the url to the remote page and get back a html doucment.
The functions in the HTMLDocument object will allow you to do javascript like dom queries such as :
docObject.GetElementById("id");
I then have different functions that do different things with this object based on the page / site i'm returning data from.
There is however one fatal flaw here ...
This is likely to work really well with sites that don't change much in structure and are built by code but not so well on less dynamic sites.
With stackoverflow for example its easy to pull out a question and the accepted answer for that question so I could use this code to pull and publish content from here on my own web site.
However ...
This is not going to help you for user / login related details as this sort of information is not shared to generally everyone.
It's bit like me going and trying this to link facebook profiles to my own website, I would have to go through some form of api that asked the user to authenticate their details before making the request.
simply pulling a web page based on a url only will give the other site no authentication information unless that site accepts the user login details in the quesrystring and you already have them.
You may however be able to chain requests by ripping apart my sample method, requesting the login page parsing the results, filling in the form, then posting back using the same web client instance to login then requesting the url.
The idea being that you would have a form that asks the user to put in their login details for the remote site on your site then you go and find their profile page based on that.
This would be best farmed out to a class rather than just a simple method like i have here.
In my case though i was only after something simple (the bbc top 40 uk charts) which i pulled information from not only the bbc but places like amazon, google, and youtube, then i built a page :)
It's neat but serves no functional purpose other than pulling all your other fave sources of info on to 1 page.