While HTML Scraping is pretty well-documented from what I can see, and I understand the concept and implementation of it, what is the best method for scraping from content that is tucked away behind authentication forms. I refer to scraping from content that I legitimately have access to, so a method for automatically submitting login data is what I'm looking for.
All I can think of is setting up a proxy, capturing the throughput from a manual login, then setting up a script to spoof that throughput as part of the HTML scraping execution. As far as language goes, it would likely be done in Perl.
Has anyone had experience with this, or just a general thought?
Edit This has been answered before but with .NET. While it validates how I think it should be done, does anyone have Perl script to do this?