views:

518

answers:

7

I wrote a perl script a while ago which logged into my online banking and emailed me my balance and a mini-statement every day. I found it very useful for keeping track of my finances. The only problem is that I wrote it just using perl and curl and it was quite complicated and hard to maintain. After a few instances of my bank changing their webpage I got fed up of debugging it to keep it up to date.

So what's the best way of writing such a program in such a way that it's easy to maintain? I'd like to write a nice well engineered version in either Perl or Java which will be easy to update when the bank inevitably fiddle with their web site.

+2  A: 

Hmm, just found

Finance::Bank::Natwest

Which is a perl module specifically for my bank! Wasn't expecting it to be quite that easy.

Benj
Hmm, doesn't look like that perl script works anymore sadly. Last updated 2003!
Benj
Wow. That's old. But althought it obviously has not been maintained, it might still be maintainable?
innaM
Yes, it does look quite well written. I've emailed the author to ask him if he still uses it personally.
Benj
I would strongly discourage using this sort of script. Most banking websites use a random PART of the pin / password to get access. This is to defend against key-loggers, spyware and man in the middle attacks. BEWARE!!!
heferav
Yes, I'm aware there are security implications and obviously I parse the web pages from the bank to work out which digits of my password they want. The password it's self is stored in a GPG encrypted file using the GPG-agent to obtain easy access. I know it's security by obscurity but my system is so unique I don't fear intrusion all that much.
Benj
+5  A: 

If I were to give you one advice, it would be to use XPath for all your scraping needs. Avoid regexes.

Geo
+1 definitely something with a native HTML parser is going to be much better in the long run.
bobince
Yes, my previous solution was regexp heavy, I'll definitely avoid that this time if possible.
Benj
Unless the HTML is very exceptional or you are using `HTML::TreeBuilder::XPath`, this is bound to be frustrating.
Sinan Ünür
+13  A: 

In Perl, something like WWW::Mechanize can already make your script more simple and robust, because it can find HTML forms in previous responses from the website. You can fill in these forms to prepare a new request. For example:

my $mech = WWW::Mechanize->new();
$mech->get($url);
$mech->submit_form(
    form_number => 1,
    fields      => { password => $password },
);
die unless ($mech->success);
Bruno De Fraine
+8  A: 

A combination of WWW::Mechanize and Web::Scraper are the two tools that make me most productive. Theres a nice article about that combination at the catalyzed.org

singingfish
+1 for Web::Scraper. I've found it hard to install, but I've been able to replace huge scraping scripts with about 3 lines of Web::Scraper.
Peter Kovacs
+1  A: 

A lot of banks publish their data in a standard format, which is commonly used by personal finance packages such as MS Money or Quicken to download transaction information. You could look for that hook and download using the same API, and then parse the data on your end (e.g. parse Excel documents with Spreadsheet::ParseExcel, and Quicken docs with Finance::QIF).

Edit (reply to comment): Have you considered contacting your bank and asking them how you can programmatically log into your account in order to download the financial data? Many/most banks have an API for this (which Quicken etc make use of, as described above).

Ether
Hi thanks for the answer, but it's not really parsing the banking data I'm after, it's loging in and automatically getting round the banking environment that I want.
Benj
+1  A: 

There's a currently up to date Ruby implementation here:

http://github.com/warm/NatWoogle

anonymous
A: 

Use perl and the web::scraper package: link text

juFo