views:

382

answers:

9

Hi, I've been programming for many years now, and I have just one question.

What programming language allows you to create programs which can automatically navigate websites and perform various actions? For example, logging in, browse to a specific page, fill out forms, extract certain text and so on.

This is different to a macro as a macro only performs a predefined set of actions. On the other hand, the program would behave differently depending on what's displayed on the screen.

Perhaps some kind of scripting language? Or a general-purpose language. Your answers will be appreciated.

+2  A: 

You can use LWP::Simple in Perl.

You can find a lot of information on the web but Getting more out of LWP::Simple is a tutorial on Perlmonks.

David in Dakota
Please don't downvote me just because Perl is Undead.
David in Dakota
LWP::Simple can fetch web pages single resources, but it doesn't have any features for navigating a web site.
brian d foy
A: 

Pretty much any language will do that now, perl, php/curl in linux and asp/C# in Windows.

Check this out - PHP Form Filling Tutorial

Jeremy Morgan
A: 

You can do all this with the WebRequest object in C#

public static void Main ()
{
    // Create a request for the URL.         
    WebRequest request = WebRequest.Create ("http://www.contoso.com/default.html");
    // If required by the server, set the credentials.
    request.Credentials = CredentialCache.DefaultCredentials;
    // Get the response.
    HttpWebResponse response = (HttpWebResponse)request.GetResponse ();
    // Display the status.
    Console.WriteLine (response.StatusDescription);
    // Get the stream containing content returned by the server.
    Stream dataStream = response.GetResponseStream ();
    // Open the stream using a StreamReader for easy access.
    StreamReader reader = new StreamReader (dataStream);
    // Read the content.
    string responseFromServer = reader.ReadToEnd ();
    // Display the content.
    Console.WriteLine (responseFromServer);
    // Cleanup the streams and the response.
    reader.Close ();
    dataStream.Close ();
    response.Close ();
}
Bob
+2  A: 

I believe you are not looking for a language, but a framework that will allow you to do this. This is typically done by web scraping software. There are some online services, e.g. Mozenda that allow you to do simple stuff. There are also frameworks that help you do same in a more rigorous manner. I have some experience with screen-scraper, which I think is one of the most feature-rich.

Yet another type of framework is web crawler - this is to go through a website and index it (like for a search engine.

Jean Barmash
+3  A: 

For Perl, the WWW::Mechanize is the standard tool for navigating websites. It handles cookies, sessions, knows how to interact with forms, perform clicks, and so on. It maintains state as it goes along.

It's one drawback is not handling javascript. There are some Perl modules to interact with Javascript, but they aren't integrated with WWW::Mechanize.

brian d foy
+1  A: 

I'd certainly go for some scripting language, with Ruby/Mechanize being my favorite, take a look at some examples... Perl and Python are also good choices, for sure. Unless there's a plan for it to be a part of some other application, I'd avoid statically typed languages - too much boilerplate code IMHO.

require 'rubygems'
require 'mechanize'

agent = WWW::Mechanize.new
page = agent.get('http://google.com/')
google_form = page.form('f')
google_form.q = 'ruby mechanize'
page = agent.submit(google_form)
pp page

Mechanize is really great library, as it's not just plain HTTP GET/POST request-and-fetch: it's keeping track of cookies thus closely emulating real web browser behavior.

Mladen Jablanović
+4  A: 

I have been using Ruby and watir to do just that, its very straight forward and works by automating IE or Firefox.

with this approach the browser handles any JavaScript mess but you still have complete access to the page content so you just need to add your unique logic like filling an online form.

Alon
+2  A: 

Many of these answers are oriented towards scraping applications. If this is what you want, use the equivalent of WWW::Mechanize in your language of choice (Perl's is canonical, Python's works well too).

However, from your question it looks like you may be wanting to be automating unit tests for websites. If this is the case, in addition to that framework for testing the correctness of the returned HTML for any given page, you also want an in-browser testing framework.

Two that stand out are Twill and Selenium.

These provide exactly what you're asking for: a simple interface for browsing websites from a script. They allow more control of your browsing, particularly with respect to javascript and various types of effects which manifest themselves as you move forward and back in a website leaving a trail.

Paul McMillan
+1  A: 

WatiN is another .Net way to browse and perform various actions.

JB King