How can I use Perl regexps to extract all URLs of a specific domain (with possibly variable subdomains) with a specific extension from plain text. I have tried:
my $stuff = 'omg http://fail-o-tron.com/bleh omg omg omg omg omg http://homepage.com/woot.gif dfgdfg http://shomepage.com/woot.gif aaa';
while($stuff =~ m/(http\:\/\/.*?homepage.com\/.*?\.gif)/gmsi)
{
print $1."\n";
}
It fails horribly and gives me:
http://fail-o-tron.com/bleh omg omg omg omg omg http://homepage.com/woot.gif
http://shomepage.com/woot.gif
I thought that shouldn't happen because I am using .*?
which ought to be non-greedy and give me the smallest match. Can anyone tell me what I am doing wrong? (I don't want some uberly complex, caned regexp to validate URLs; I want to know what I am doing wrong so I can learn from it)