tags:

views:

72

answers:

4

This question shows my ignorance of regular expressions. I've never understood it quite enough.

If I wanted to match, for instance, just the URL portion of an tag in HTML, what would I need to do?

My regular expression to get the entire tag is:

<A[^>]*?HREF\s*=\s*[""']?([^'"" >]+?)[ '""]?>

I have no idea what I would need to do to get the URL out of that and I have no clue where to look in regular expression documentation to figure this out.

+3  A: 

If programming in Perl you could utilize the $1 operator within an if() statement. For ex.

if( $HREF =~ /<A[^>]*?HREF\s*=\s*[""']?([^'"" >]+?)[ '""]?>/ ) {
 print $1;
}
Suroot
Thanks, that gave me enough insight to solve my problem. Grouping was what I needed :-)
Bob
Glad I could help ^^
Suroot
+2  A: 

the exactly HOW part depends on the regex library you're using, but the way is to use a grouped expression. You actually already have one in your example, as grouped expressions are parenthesized. The href attribute value is your first group (your zeroth group is the whole expression.)

TokenMacGuy
+1  A: 

You can use round brackets to group parts of the regular expression match. In this case you could use a round bracket around the URL part and then later use a number to refer to that group. See here to see how exactly you can do this.

Rahul
A: 

I switched things up a bit - try something like this:

<a[^>]*href="([^"]*).*>
Andrew Hare
The problem is that you can use a ' within the href tag.
Suroot
Suroot: do you mean you can't use a single quote in a href attribute? Why not?
eyelidlessness