tags:

views:

1252

answers:

5

I have a string in Perl like: "Full Name (userid)" and I want to return just the userid (everything between the "()"'s).

What regular expression would do this in Perl?

+4  A: 

This will match any word (\w) character inside of "(" and ")"

\w matches a word character (alphanumeric or _), not just [0-9a-zA-Z_] but also digits and characters from non-roman scripts.

my($username) = $str =~ /\((\w+)\)/;
# or
$str =~ /\((\w+)\)/;
my $username  = $1;


If you need it in a s///, you can get at the variable with $1 or \1.

$str =~ s/\((\w+)\)/$1:\1/; # pointless example


If you want to capture all possibilities these would work better:

my($username) = $str =~ /\(([^\)]+)\)/;
# or
my($username) = $str =~ /\((.+?)\)/;


If your regexp starts to get complicated, I would recommend you learn about the /x option.

my($username) = $str =~ / \(  ( [^\)]+ )  \) /x;


Please see perldoc perlre, for more information.

If you are just beginning to learn regexps, I would recommend reading perldoc perlretut.

Brad Gilbert
Surely if he wants what's *between* the parentheses, it would be /(\(\w+\))/ ?
IRBMe
Assuming he doesn't need the `()` my answer is correct.
Brad Gilbert
+4  A: 

Escape the brackets, capture the string in-between. Assuming user ids consist of \w characters only:

my ($userid) = $str =~ /\((\w+)\)/ ;

m// in list context returns the captured matches.

More information on capturing can be found in

C:\> perldoc perlretut

Sinan Ünür
So just by putting the extra parenthesis it returns the value? (The (\w+) vs just \w+ ?
Brian
That is called capturing. If the match is evaluated in list context (e.g. `my ($userid) = ` as opposed to `my $userid = `, all the captured matches will be returned. In this case, there is only one.
Sinan Ünür
+1  A: 

This will get anything between the parentheses and not just alphanumeric and _. This may not be an issue, but \w will not get usernames with dashes, pound signs, etc.

$str =~ /\((.*?)\)/ ;

RC
+3  A: 

When you search for something between brackets, e.g. '< > [ ] ( ) { }' or more sophisticated such as xml/html tags, it's always better to construct your pattern in the way:

opening bracket, something which is NOT closing bracket, closing bracket

Of course, in your case 'closing bracket' can be omitted:

my $str = 'Full Name (userid)';
my ($user_id) = $str =~ /\(([^\)]+)/;
zakovyrya
umm, doesn't the non-greedy operator (the ? in .*? in RC's example) eliminate the need for this pattern? The non-greedy operator will cause the regex to match at the first closing parens, rather than the ultimate one. Your approach is more compatible, since not all regex implementations have a non-greedy operator, but for those that do (like Perl), I find it easier to read than your negation pattern. Just a style thing....
Val
This approach is safer. If you want use it inside some more complex regular expression you can be easy bitten by back-trace if you rely on non-greedy pattern.
Hynek -Pichi- Vychodil
Actually it's not only safer, but more efficient, because backtracking to satisfy non-greediness is eliminated
zakovyrya
+2  A: 

In addition to what has been said: If you happen to know that your string has exactly this format, you can also do without regexp. If your string is in $s, you could do

chop $s; # throws away last character (by assumption must be closing parenthesis)
$username=substr($s, rindex($s,'(') + 1);

As for the regexp solutions, can you be sure that the full name can not contain also a pair of parentheses? In this case, it might make sense anchoring the closing ')' at the end of the pattern:

/ [(]     # open paren
 ([^(]+)  # at least one non-open paren 
  [)]     # closing paren
  $       # end of line/pattern
/x && $username = $1;