tags:

views:

28

answers:

2

I want to write a regex that matches a url that ends with ".mp4" given that there are multiple urls in a line.

For example, for the following line:

"http://www.link.org/1610.jpg","Debt","http://www.archive.org/610_.mp4","66196517"

Using the following pattern matches from the first http until mp4.

(http:\/\/[^"].*?\.mp4)[",].*?

How can I make it match only the last url only?

Note that, the lines may contain any number of urls and anything in between. But only the last url contains .mp4 ending.

+2  A: 

Use:

.*"(http:\/\/[^"].*?\.mp4)".*

Wildcards are by default greedy. The first part of this will start by grabbing the entire string and then backtrack until it finds a URL. Probably not the most efficient way to do it but it doesn't really matter since you're only doing this on a line of text (unless, say, the line is tens of millions of characters long).

By the way, the piece you had at the end ([",]) wasn't quite correct. That pattern means match either " OR , when I suspect what you really mean is match that sequence (based on your sample line).

Lastly, you don't need to make the final wildcard greedy. You don't need it at all if you're doing a find rather than trying to match the entire line either.

cletus
A: 

Try with

,\s*"(http://[^"]*?\.mp4)"\s*,\s*.*$

(PCRE not using / as delimiter, using e.g. | instead); it matched http://www.archive.org/610_.mp4, if the " opens and closes a link, i.e. " link " is not allowed; otherwise, add \s*? to match those spaces too. Another maybe wrong assumption: the link is the last link, but not the last element; if it is not so, mp4)"$ could be the ending of the RE instead of the one used now.

ShinTakezou
... but as best answer notes, you can stop at `mp4)"` indeed...
ShinTakezou