tags:

views:

65

answers:

6

I'm trying to write a python regular expression that will match both these URLs:

http://www.waymarking.com/waymarks/WM6N3G_Battle_Mountain_State_Park
http://www.waymarking.com/waymarks/WM6N3G

and for both will capture:

http://www.waymarking.com/waymarks/WM6N3G

This is what I have:

(http://www.waymarking.com/waymarks/.*?)_?.*?

But it only matches:

http://www.waymarking.com/waymarks/

Thanks!

+4  A: 
(http://www.waymarking.com/waymarks/[^_]*).*
cdm9002
Better than my answer. The [^_] is great.
Chris Thompson
Damn was too slow :(. Possibly should use + given that there should always be one or more.
rezzif
Note that a negated character class will also match newlines. Of course, it depends on the format of the source strings whether this could be an issue.
Geert
A: 

non regex way

url="http://www.waymarking.com/waymarks/WM6N3G_Battle_Mountain_State_Park"
s = url.split("_")
print s[0]
ghostdog74
A: 

*? makes something completely optional and won't be included if it doesn't have to

(http://www.waymarking.com/waymarks/[^_]+)(_.*)?)
rezzif
A: 

What about this:

 (http://www.waymarking.com/waymarks/[a-zA-Z0-9]*)_?.*?
Chris Thompson
A: 

.*(http://www.waymarking.com/waymarks/WM6N3G).* if it is inline

.*? is non-greedy, and so will give up everything except for one character, in this case.

Sean Vieira
+1  A: 

How about

(http://www.waymarking.com/waymarks/[^_]+)
Vinay Sajip