tags:

views:

3405

answers:

10

Most of my users have email addresses associated with their profile in /etc/passwd. They are always in the 5th field, which I can grab, but they appear at different places within a comma-separated list in the 5th field.

Can somebody give me a regex to grab just the email address (delimeted by commas) from a line in this file? (I will be using grep and sed from a bash script)

Sample lines from file:

user1:x:1147:5005:User One,Department,,,[email protected]:/home/directory:/bin/bash
user2:x:1148:5002:User Two,Department2,[email protected],:/home/directory:/bin/bash
+6  A: 

A standard email regular expression should work fine:

http://regexlib.com/DisplayPatterns.aspx

You can also try out the excellent: http://txt2re.com website!

Clint Ecker
+5  A: 

What about:

,([^@]+@[^,:]+)

Where the group contains the email address.

[Updated based upon comment that address doesn't always get terminated by a comma]

Ray Hayes
The field only sometimes ends with a comma
Brent
So always prepend and append a comma before using the RegEx.
Craig Trader
Or replace [^,] with [^,:] - I think that's simpler
Brent
+1  A: 

BTW The fifth field's known as the GCOS field. Sometimes spelt GECOS.

Rob Wells
+1  A: 

Search for all email-valid-characters before and after the @ sign. Like:

[-A-z0-9.]+@[-A-z0-9.]+

Greedy matching should pull in everything it can, and it'll stop at the commas or colons.

Check which characters are valid in email addresses, though. I've left some out (like +)

JBB
Underscore is valid too...
Ray Hayes
Probably easier to state what you don't want rather than try to work out what is valid. In this case he didn't want commas (if that is valid in an email address then I think he's out of luck for RegExpr).[^,]+ will do in this case.
Ray Hayes
Actually I put underscores in there. That's why the ]+@[-A-z0-9. is italicized. :)
JBB
Actually there are other characters besides '_' that are legal. See RFC 2821 and RFC 2822 for details.
Craig Trader
You can keep the right hand side (after the @) as [-A-Za-z0-9.]+ , since FQDNs can only consist legally of those characters. The left hand side has a much broader set of legal characters, per the RFCs.
nsayer
A: 
[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?

should catch most emials

Unkwntech
+1  A: 

Actually, this looks like a perfect job for Awk. Now, like most people I will say "I'm no expert in Awk" before proceeding...

awk -F : '{print $5}' /etc/passwd

would get the 5th field where ':' is the field separator from /etc/passwd - it's probably the 5th field you are wanting.

awk -F , '{print $1}'

would get the 1st field from standard input where ',' was he delimimter so

awk -F : '{print $5}' /etc/passwd | awk -F , '{print $1}'

would get the first comma separated field (the Name field) from the fifth colon separated field (the field with all that kind of cruft in it!) in your /etc/passwd file.

Adjust the print $1 to get the field with your emails in it.

Doubtless there is away to do this without the pipe in Awk. I use Awk for splitting out fields in things and not much else. I find it confusing, and that's from somebody that loves regular expressions...

reefnet_alex
This will only work if the address is always in the same comma delimited field - which the question states, it is not.
Brent
This is true, I had seen different places but not interpreted it as in different comma delimited fields, but looking at the example it all becomes clear. My bad.
reefnet_alex
+1  A: 
sed -r -e "s/^.*[,:]([^,:]+@[^,:]+).*$/\1/g" /etc/passwd

Will do the trick

Brent
A: 

How about the standard RFC 2822:

(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])

Yep. That's it. :)

Marcio Aguiar
... actually, a full implementation of that RFC is somewhat more... complex: http://ex-parrot.com/~pdw/Mail-RFC822-Address.html
dsm
A: 
sed 's/,*:\/.*//;s/^.*://;s/.*,//' /etc/passwd