tags:

views:

96

answers:

5

Hi,

Could I get very easy to follow code examples on the following:

  1. Use browser control to launch request to a target website.
  2. Capture the response from the target website.
  3. convert response into DOM object.
  4. Iterate through DOM object and capture things like "FirstName" "LastName" etc if they are part of response.

thanks

+1  A: 

You may take a look at Html Agility Pack and/or SgmlReader. Here's an example using SgmlReader which selects all the nodes in the DOM containing some text:

class Program
{
    static void Main()
    {
        using (var reader = new SgmlReader())
        {
            reader.Href = "http://www.microsoft.com";
            var doc = new XmlDocument();
            doc.Load(reader);
            var nodes = doc.SelectNodes("//*[contains(text(), 'Products')]");
            foreach (XmlNode node in nodes)
            {
                Console.WriteLine(node.OuterXml);
            }
        }
    }
}
Darin Dimitrov
+2  A: 

Here is code that uses a WebRequest object to retrieve data and captures the response as a stream.

    public static Stream GetExternalData( string url, string postData, int timeout )
    {
        ServicePointManager.ServerCertificateValidationCallback += delegate( object sender,
                                                                                X509Certificate certificate,
                                                                                X509Chain chain,
                                                                                SslPolicyErrors sslPolicyErrors )
        {
            // if we trust the callee implicitly, return true...otherwise, perform validation logic
            return [bool];
        };

        WebRequest request = null;
        HttpWebResponse response = null;

        try
        {
            request = WebRequest.Create( url );
            request.Timeout = timeout; // force a quick timeout

            if( postData != null )
            {
                request.Method = "POST";
                request.ContentType = "application/x-www-form-urlencoded";
                request.ContentLength = postData.Length;

                using( StreamWriter requestStream = new StreamWriter( request.GetRequestStream(), System.Text.Encoding.ASCII ) )
                {
                    requestStream.Write( postData );
                    requestStream.Close();
                }
            }

            response = (HttpWebResponse)request.GetResponse();
        }
        catch( WebException ex )
        {
            Log.LogException( ex );
        }
        finally
        {
            request = null;
        }

        if( response == null || response.StatusCode != HttpStatusCode.OK )
        {
            if( response != null )
            {
                response.Close();
                response = null;
            }

            return null;
        }

        return response.GetResponseStream();
    }

For managing the response, I have a custom Xhtml parser that I use, but it is thousands of lines of code. There are several publicly available parsers (see Darin's comment).

EDIT: per the OP's question, headers can be added to the request to emulate a user agent. For example:

request = (HttpWebRequest)WebRequest.Create( url );
                request.Accept = "application/x-ms-application, image/jpeg, application/xaml+xml, image/gif, image/pjpeg, application/x-ms-xbap, application/x-shockwave-flash, */*";
                request.Timeout = timeout;
                request.Headers.Add( "Cookie", cookies );

                //
                // manifest as a standard user agent
                request.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US)";
Tim
+1  A: 

You could also use selenium to easily traverse the DOM and grab the values of the fields. It will also automatically open the browser for you.

Climber104
A: 

Here you can find a tutorial from 4 parts to what you want.

this is the first one , the 4 parts are here (How to Write a Search Engine)

M.H
+1  A: 

If you want a pure C# way to traverse web pages, a good place to look is WatiN. It allows you to easily open a web browser and go through the web page (and actions) via C# code.

Here's an example for searching google with the API (taken from their docs)

using System;
using WatiN.Core;

namespaceWatiNGettingStarted
{
  class WatiNConsoleExample
  {
    [STAThread]
    static void Main(string[] args)
    {
      // Open a new Internet Explorer window and
      // goto the google website.
      IE ie = new IE("http://www.google.com");

      // Find the search text field and type Watin in it.
      ie.TextField(Find.ByName("q")).TypeText("WatiN");

      // Click the Google search button.
      ie.Button(Find.ByValue("Google Search")).Click();

      // Uncomment the following line if you want to close
      // Internet Explorer and the console window immediately.
      //ie.Close();
    }
  }

}

KallDrexx
Thank you for sample code
dotnet-practitioner