ansaurus

Question

Scripting HTTP more effeciently

Answer 1

A:

What about using PHP+Curl, or just bash?

Mr-sk 2010-01-11 16:16:26

Answer 2

+2 A:

Python urllib may be what you're looking for.

Alternatively powershell exposes the full .NET http library in a scripting environment.

John Weldon 2010-01-11 16:16:43

Often, folks need urllib2 more than urllib.

S.Lott 2010-01-11 16:48:34

Answer 3

+4 A:

Mechanize for Python seems easy to use: http://wwwsearch.sourceforge.net/mechanize/

emil 2010-01-11 16:18:24

Mechanize for Ruby also works well.

Wayne Conrad 2010-01-11 16:23:56

It also seems to exist for perl as well.

emil 2010-01-11 16:30:40

Answer 4

+6 A:

Have a look at Selenium. It generates code for C#, Java, Perl, PHP, Python, and Ruby if you need to customize the script.

jbochi 2010-01-11 16:18:31

Of the code that is generated, what http libraries do they use? The default ones or ones like mechanize?

Zombies 2010-01-11 16:40:46

Selenium has a wrapper library for each one the languages.

jbochi 2010-01-11 16:49:54

As far as I understand Selenium, it uses a fully flegded browser like firefox. This seems to me like killing a mouse with a nuclear bomb.

johannes 2010-01-11 17:47:57

Hardy johannes. It's a using a browser (which is the most common way to use web apps) to test web apps (which are most commonly accessed through a browser).

Noufal Ibrahim 2010-01-11 18:16:36

I tried Selenium. I was considering using it initially build a script and then edit it. Here is an example of the API: @selenium.click "link=Yahoo!". To me this is problematic if the link changes or if the link is dynamically generated.

Zombies 2010-01-11 19:14:11

Answer 5

+6 A:

Watir sounds close to what you want although it (like Selenium linked to in another answer) actually opens up a browser to do stuff. You can see some examples here. Another browser based record + playback approach system is sahi.

If your application is uses WSGI, then paste is a nice option.

Mechanize linked to in another answer is a "browser in a library" and there are clones in perl, Ruby and Python. The Perl one is the original one and this seems to be the way to go if you don't want a browser. The problem with this approach is that all the front end code (which might rely on JavaScript), won't be exercised.

Noufal Ibrahim 2010-01-11 16:28:17

Does httpunit for java execute JavaScript?

Zombies 2010-01-11 16:32:37

Also, when it opens a browser.... does it become an active window and manipulate clicks/events?

Zombies 2010-01-11 16:33:57

Never used it. Unless there's a browser involved (or it loads up a JS interpreter somehow), I don't think it will.

Noufal Ibrahim 2010-01-11 16:34:00

It's hard for me to describe how the browser based systems work. The Selenium homepage has a few screencasts. If you view them, I'm sure you'll understand.

Noufal Ibrahim 2010-01-11 16:35:38

@Zombies: httpunit supports some javascript, see http://www.httpunit.org/doc/javascript-support.html

stephan 2010-01-11 17:17:29

Answer 6

A:

Some ruby libraries:

httparty: really interesting, the philosophy is interesting.
mechanize: classic good-quality web automatization library.
scrubYt: puzzling at first glance but fun to use.

paradigmatic 2010-01-11 16:32:48

Answer 7

+6 A:

My turn : wget or perl with lwp. You'll find example on the linked page.

Aif 2010-01-11 16:37:13

Answer 8

+3 A:

Depending on exactly what you're doing the easiest solution looks to be bash + curl.

The man page for the latter is available here:

http://curl.haxx.se/docs/manpage.html

You can do posts as well as gets, HTTPS, show headers, work with cookies, basic and digest HTTP authentication, tunnel through all sorts of proxies, including NTLM on *nix amongst other things.

curl is also available as shared library with C and PHP support.

HTH

C.

symcbean 2010-01-11 16:37:18

Answer 9

+4 A:

I'm testing ReST APIs at the moment and found the ReST Client very nice. It's a GUI program, but nonetheless you can save and restore queries as XML files (or let them be generated), embed, write test scripts, and so on. And it's Java based (which is not an ad-hoc advantage, but you mentioned it).

Minus points for recording sessions. The ReST Client is good for stateless "one-shots".

If it doesn't suit your needs, I'd go for the already mentioned Mechanize (or WWW-Mechanize, as it is called at CPAN).

Boldewyn 2010-01-11 16:49:02

Answer 10

+2 A:

Twill is pretty good and made for testing. It can be used as script, in an interactive session or within a Python program.

ars 2010-01-11 17:03:52

Answer 11

+6 A:

If you have simple needs (fetch a page and then parse it), it is hard to beat LWP::Simple and HTML::TreeBuilder.

use strict;
use warnings;

use LWP::Simple;
use HTML::TreeBuilder;

my $url = 'http://www.example.com';
my $content = get( $url) or die "Couldn't get $url";

my $t = HTML::TreeBuilder->new_from_content( $content );
$t->eof;
$t->elementify;

# Get first match:
my $thing = $t->look_down( _tag => 'p', id => qr/match_this_regex/ );

print $thing ? $thing->as_text : "No match found\n";

# Get all matches:
my @things = $t->look_down( _tag => 'p', id => qr/match_this_regex/ );

print $_ ? $_->as_text : "No match found" for @things;

daotoad 2010-01-11 17:24:14

Yup, LWP and HTML::TreeBuilder usually go together.

Leonardo Herrera 2010-01-11 18:05:18

@Leonardo, like chocolate and peanut butter--good on their own, but better together.

daotoad 2010-01-12 03:21:59

Answer 12

+2 A:

Perl and WWW::Mechanize can make web scraping etc simple and easy, including easy handling of forms (let's say you want to go to a login page, fill in a username and password and submit the form, handling cookies / hidden session identifiers just as a browser would...)

Similarly, finding or extracting links from the fetched page is trivial.

If you need to parse stuff out of the resulting pages that WWW::Mechanize can't easily help with, then feed the result to HTML::TreeBuilder to make parsing easy.

David Precious 2010-01-11 21:21:45

ansaurus

tags:

views:

answers:

Scripting HTTP more effeciently

related questions