slurp

How to set up a robot.txt which only allows the default page of a site

Say I have a site on http://website.com. I would really like allowing bots to see the home page, but any other page need to blocked as it is pointless to spider. In other words http://website.com & http://website.com/ should be allowed, but http://website.com/anything and http://website.com/someendpoint.ashx should be blocked. Further...

In Perl, how can I read an entire file into a string?

So I'm working on a server where I can't install any modules whatsoever. Yes this makes my job difficult. To complete the rest of my script I need to do something that I had thought would be fairly straightforward but it seems to be anything but. I'm trying to open an .html file as one big long string. This is what I've got: ope...

How do I read a file's contents into a Perl scalar?

Hi, what i am trying to do is get the contents of a file from another server. Since im not in tune with perl, nor know its mods and functions iv'e gone about it this way: my $fileContents; if( $md5Con =~ m/\.php$/g ) { my $ftp = Net::FTP->new($DB_ftpserver, Debug => 0) or die "Cannot connect to some.host.name: [email protected]"; $ftp->log...

Why is Yahoo! Slurp requesting /1338.aspx?

Ip: 67.195.112.247 Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; http://help.yahoo.com/help/us/ysearch/slurp) System.Web.HttpException: The file '/1338.aspx' does not exist. IP : 67.195.112.247 Host : b3091104.crawl.yahoo.net Country: United States ...

How does the Perl's Slurp module work?

I had a look at the source of Slurp and I would love to understand how does slurp() work: sub slurp { local( $/, @ARGV ) = ( wantarray ? $/ : undef, @_ ); return <ARGV>; } Where is the file even opened? ...