ansaurus

Question

Capturing part of a url

Answer 1

+1 A:

For most re engines you probably want [^#] (the ^ negates a character class).

ccmonkey 2010-02-18 03:26:51

Answer 2

+1 A:

"Anything but" is called a negated character class, and, in your case, is spelled

[^#]

Your regex would be

http://www.a.com/farms/([^#]+)

Jonathan Feinberg 2010-02-18 03:26:55

Answer 3

A:

depending on your language, you might want to use modules/libraries that can parse url nicely for you. eg in PHP, you can use parse_url

$url = "http://www.a.com/farms/important-stuff-here#ignorable-stuff";
$parsed = parse_url($url);
print $parsed['path'];

with Python, urlparse() eg:

>>> import urlparse
>>> s=""http://www.a.com/farms/important-stuff-here#ignorable-stuff"
>>> urlparse.urlparse(s).path
'/farms/important-stuff-here'

IF you really want to do it by hand, first replace everything from "#" onwards, then replace everything from the start till "/"

$ echo "http://www.a.com/farms/important-stuff-here#ignorable-stuff" | sed 's/#.*//;s|.*\/||'
important-stuff-here

Or using just plain splits on strings

$url = "http://www.a.com/farms/important-stuff-here#ignorable-stuff";
$s = explode("#",$url,2);
$t = explode("/",$s[0]);
print end($t);

ghostdog74 2010-02-18 03:28:49

won't the path() also include "farm/" ? I just want [important-stuff-here], thank you.

2010-02-18 03:48:40

that's easy to fix. explode/split on "/" and get the right item. I will let you do that yourself.

ghostdog74 2010-02-18 04:22:36

Capturing part of a url