ansaurus

Question

Why doesn't the match operator match anything?

Answer 1

+7 A:

Your /x regex modifier messes something with whitespaces. Remove it.

That is, it should be

if (/\s<div class="v120WrapperInner"><a href="([^"]*)" title="([^"]*)"><img/si)

/x makes perl ignore whitespaces inside regex, making your regex equivalent of following:

/\s<divclass="v120WrapperInner"><a href="([^"]*)"title="([^"]*)"><img/six

that will not match.

Also that \s at the beginning may brake things.

This is the code I've used for testing:

use strict;


my $inp = '<div class="v120WrapperInner"><a href="/redirect?q=http%3A%2F%2Fwww.google.com%2Faclk%3Fsa%3DL%26ai%3DCKJh--O7tSsCVIKeyoQTwiYmRA5SnrIsB1szYhg2d2J_EAhABIJ7rxQ4oA1CLk676B2DJntmGyKOQGcgBAaoEFk_Qyu5ipY7edN5ETLuchKUCHbY4SA#0%26num%3D1%26sig%3DAGiWqtwtAf8NslosN7AuHb7qC7RviHVg7A%26q%3Dhttp%3A%2F%2Fwww.youtube.com%2Fwatch%253Fv%253D91sYT_8CN8Q%2526feature%253Dpyv%2526ad%253D3409309746%2526kw%253Dsusan%25252#0boyle&amp;adtype=pyv&amp;event=ad&amp;usg=bR7ErKA_3szWtQMGe2lt1dpxzHc=" title="The Valley Downs Chicago"><img class="vimg120" alt="The Valley Downs Chicago" src="http://i2.ytimg.com/vi/91sYT_8CN8Q/1.jpg"&gt;';

print "$inp\n";

if ( $inp =~ /<div class="v120WrapperInner"><a href="([^"]*)" title="([^"]*)"><img/si )
{
 print "m:\n$1\n$2\n";
}

n0rd 2009-11-01 22:38:35

hmmm...that didn't seem to make a difference.

BeachRunnerJoe 2009-11-01 22:44:32

I've just tested it, it works perfectly. But I've removed \s at the beginning.

n0rd 2009-11-01 22:46:48

perfect! the "\s" was causing the problem. thanks so much! why is that"\s" problematic?

BeachRunnerJoe 2009-11-01 22:51:25

Because it reqires a whitespace before opening angle bracket.

n0rd 2009-11-01 22:52:01

learning a lot! thanks again :)

BeachRunnerJoe 2009-11-01 22:53:44

This is one of the reasons people shouldn't blindly do anything Perl Best Practices says. If you don't understand the match operator options, don't use them.

brian d foy 2009-11-02 14:39:34

Answer 2

A:

G'day,

If you're having problems understanding regexp's can I suggest having a read of the regexp intro in Dale Dougherty's excellent book "sed & awk" (sanitised Amazon link).

Definitely one of the best intro's to regexp's around.

HTH

cheers,

Rob Wells 2009-11-01 22:42:09

This appears to use an advertising/referral link, rather than going directly to http://www.amazon.com/dp/1565922255/ ?

Peter Boughton 2009-11-02 14:28:44

@Peter, oops. my late night mistake. it's not a complete referral link as there's no id in it tho. I've changed to to point to the proper vanilla link.

Rob Wells 2009-11-02 14:37:52

Actually, when I go in and look at the raw markdown, the link **does** in fact point to the vanilla amazon.com/dp/ISBN-10

Rob Wells 2009-11-02 14:40:38

Hmm, so this is something SO is doing. :/ Not sure why - there's no need to mask links with Amazon referrals. Due to more SO stupidity, I had to do a dummy edit to your post so I could remove my previous downvote.

Peter Boughton 2009-11-02 14:49:03

Please stop downvoting this answer. It is SO that is rewriting the links now. See this post on meta <http://meta.stackoverflow.com/questions/26964/auto-inserting-stack-overflow-affiliate-into-all-amazon-book-links>

Rob Wells 2009-11-02 14:50:06

Unfortunately, whoever the downvote(s) are have no way of knowing unless they manually come back to check (something I always try to do), because SO doesn't have any form of 'you commented/voted on this and now things have changed' notification - there's a meta request out there for that, but then it's been around since UserVoice was used, so not sure it'll get implemented any time soon.

Peter Boughton 2009-11-02 15:00:02

Answer 3

+2 A:

It is good that you are gaining experience with regex in perl, but for this type of work you might consider using a DOM parser like XML::DOM.

Ewan Todd 2009-11-01 22:47:48

Answer 4

+3 A:

Okay, this is not exactly what you are asking, but I think (based in this and your older question) that you are parsing HTML.

Let me tell you this: regexes aren't the solution. You should use HTML::TreeBuilder to parse HTML documents, because HTML documents are horribly messy.

#!/usr/bin/perl
use strict;
use warnings;
use HTML::TreeBuilder;

my $root = HTML::TreeBuilder->new_from_file(\*DATA);
foreach my $div ($root->find_by_tag_name('div')) {
    if ($div->attr('class') eq 'v120WrapperInner') {
        foreach (my $a = $div->find_by_tag_name('a')) {
            print "m:\n", $a->attr('href'), "\n", $a->attr('title'), "\n";
        }
    }
}

Leonardo Herrera 2009-11-02 13:22:24

ansaurus

tags:

views:

answers:

Why doesn't the match operator match anything?

related questions