+19  A: 

I find the YAPE::Regex::Explain module very helpful -

C:\>perl -e "use YAPE::Regex::Explain;print YAPE::Regex::Explain->new(qr/['-])->explain;"
The regular expression:

(?-imsx:['-])

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  ['-]                     any character of: ''', '-'
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------



C:\>perl -e "use YAPE::Regex::Explain; print YAPE::Regex::Explain->new(qr/(\w+), ?(.)/)->explain;"
The regular expression:

(?-imsx:(\w+), ?(.))

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    \w+                      word characters (a-z, A-Z, 0-9, _) (1 or
                             more times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  ,                        ','
----------------------------------------------------------------------
   ?                       ' ' (optional (matching the most amount
                           possible))
----------------------------------------------------------------------
  (                        group and capture to \2:
----------------------------------------------------------------------
    .                        any character except \n
----------------------------------------------------------------------
  )                        end of \2
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------

C:\>
Ed Guiness
whoooa hold on what the hay is all this? i appreciate the help, but this is just looking and reading weird... whats with all the ---------------------?
CheeseConQueso
nevermind... it just came up as pre code... last time i viewed it, it was regular formatted
CheeseConQueso
It's output from YAPE::Regex that will look better on your command line. The point is that there is a neat tool to help explain regex.
Ed Guiness
yeah that does look helpful
CheeseConQueso
+1  A: 

1st line: characters inside [] (' and -) are matched and replaced (s) by nothing, thus removed. /g means global and will try to match everything in the string.

2nd line: \w means a word character, + means more than once. ? means 0 or once. "." means anything. So it means find any word character found more than once, followed by a coma, followed by a space zero or once, followed by one of any character.

Loki
+8  A: 

I keep one of these cheat sheets pinned on my cube wall for just such occasions. Google for regular expression cheat sheet to find others.

To add to what you already know:

  g -- search globally throughout the string
  + -- match at least one, but as many as possible
  ? -- match 0 or 1
  . -- match any character
 () -- group these together
  , -- a plain comma, no special meaning
 [] -- match any character inside the brackets
 \w -- match any word character

The magic is in the grouping -- the match expression uses the groups and puts them into variables $1 and $2. In this case $1 matches the word before the comma and $2 matches the first character following the whitespace after the comma.

tvanfosson
yeah, i promptly removed that from my "knowns" when i found out haha - foolish
CheeseConQueso
ahh thattts how the $1 and $2 exist.. thanks
CheeseConQueso
just a small addition, the whitespace after the comma is optional (due to the ?)
Dashogun
@Dashogun. Correct, but his example has the whitespace in it.
tvanfosson
+1  A: 
$lhs =~ s/foo/bar/g;

The s/ operator is a modifying regexp in Perl - you match the LHS against the first part on the right (foo). The second part specifies the replacement for the match in the first part (bar). So "Lafooey" goes to "Labarey".

In your question, the aim is to remove all ' and - like in "O'Hanlon" and "Chalmonly-Witherington-Smyth".

Then it matches "Lastname, First character of firstname". The parentheses put the values of these matches into the variables $1 and $2.

And prints the lowercase of "F" + "Lastname", because these are the values in $2 and $1.

At the end of it, you have a viable username for a system based upon the person's real name from a telephone directory style listing.

JeeBee
+1  A: 

iirc the =~ means make equal to the match (cf "~" alone returning true if matched)

annakata
+1  A: 

The =~ matches the expression (string) on its left hand side against the regular expression on its right hand side, it does not modify the string. Asa side effect is set the variables $1, $2, ... to the bracketed parts matched.

In your case the first bracket will match "(\w+)" (word characters repeated one or more time, and the second will match "(.)" (the first letter of the given name. The " ?" expression will match an optional space.

Diomidis Spinellis
+3  A: 

Download "The Regex Coach" and explore it. Consider purchasing "Mastering Regular Expressions" as it will walk you through this minefield. It is one of the best-typeset books I've ever seen and is deeply informative yet penetrable.

+1  A: 

Note that the given code fails miserably if the input isn't in the right format. Here's what I would do:

$rowfetch =~ s/[ '-]//g; #All chars inside the [ ] will be filtered out.
if($rowfetch =~ m/(\w+),([a-z])/i) {
    printf $fh lc($2.$1);
}

the $1-$9 positional variables hold the last successful match, but they are not reset in the case of failed matches. This means if the regex fails to match, $1 and $2 will not be erased and you'll end up with something other than what you wanted.

I've also altered the regex slightly. The first line also removes spaces. Since it appears that you are creating usernames or email addresses, you don't want spaces. The second line is stricter to ensure that $2 is a letter, and not some other character. The 'i' at the end tells perl to make all letter matches case insensitive. With it , I don't have to make that second part ([a-zA-Z]).

thanks... ill keep this in mind
CheeseConQueso
+1  A: 

There is a great web front end to YAPE::Regex::Explain.

Here is the explanation of s/['-]//g

and for m/(\w+), ?(.)/

drewk