views:

390

answers:

4

My string of text looks like this

[email protected] (John Doe)

I need to get just the part before the @ and nothing else. The text is coming from a simple xml object if that matters any.

The code i ahve looks like this

$authorpre = $key->{"author"};
$re1='((?:[a-z][a-z]+))';

if ($c=preg_match_all ("/".$re1."/is", $authorpre, $matches))
{
    $author=$matches[1][0];
}

Sometimes the username might have numbers or an underscore before the @ symbol, which is where the regex stops it seems.

+13  A: 

The regular expression that will match and capture any character until it reaches the @ character:

([^@]+)

That seems like what you need. It'll handle all kinds of freaky variations on e-mail addresses.


I'm not sure why Ben James deleted his answer, since I feel it's better than mine. I'm going to post it here (unless he undeletes his answer):

Why use regex instead of string functions?

$parts = explode("@", "[email protected]");
$username = $parts[0];

You don't need regular expressions in this situation at all. I think using explode is a much better option, personally.


As Johannes Rössel points out in the comments, e-mail address parsing is rather complicated. If you want to be 100% sure that you will be able to handle any technically-valid e-mail address, you're going to have to write a routine that will handle quoting properly, because both solutions listed in my answer will choke on addresses like "a@b"@example.com. There may be a library that handles this kind of parsing for you, but I am unaware of it.

Welbog
depending on how intense your regex can get, i personally like explode function. Fits well with what your looking for.
Anthony Forloney
What's with the e-mail address `"a@b"@example.com`?
Joey
The fun never ends with source routes in e-mail addresses: http://www.remote.org/jochen/mail/info/address.html
PP
@Johannes: Is the `@` character allowed in the domain portion of the address? Because, if not, both solutions could still work as long as they look for the *last* `@` character instead of the first.
Welbog
Taking everything before the *last* `@` should work, yes. Unless Jane Doe comes with the cool idea to use `"j@nedoe"@example.com (J@ne Doe)` ...
Joey
@Johannes: Good catch. So, last `@` character before the first `(` character, then. Does anyone have a counterexample for that one?
Welbog
@Greg: No, because of Johannes' counterexample of the address `"a@b"@example.com`.
Welbog
Maybe `"j@n(= doe"@example.com (J@ne Doe)`? :D
Joey
(Ok, this is getting weird. I'm beginning to understand why e-mail addresses get much more restricted by many e-mail providers.)
Joey
@Johannes: Damn it. Does PHP have any built-in or third-party e-mail parsing libraries?
Welbog
Not that I know of. Still, I think your idea is perfectly fine for roughly 100 % of all e-mail addresses *in use*. But I still want to have one of those addresses one day to pester applications with poor validation routines :-)
Joey
@Johannes: So do I... Just as a subtle protest to whoever it is who came up with such a complicated grammar for e-mail addresses.
Welbog
However, if you just search for the first *unquoted* `@` and take everything before that, it should work.
Joey
@Johannes: Yeah, but deciding whether a character is quoted or not is bordering on the upper limit of regular expressions' domain. It's possible, sure, but it won't be pretty.
Welbog
Oh, and quoting can also be done with a backslash ... Yes, it's definitely unpretty :-)
Joey
To bypass the @ problem just count the number of items in the array. If it's 2 then the email does not contain any extra @. If it has more you just have to get all the items in the array except the last on and join them with a @ :D
AntonioCS
@AntonioCS: That works too, unless there's a `@` in the parenthetical string like Johannes' example `"j@n(= doe"@example.com (J@ne Doe)`. Counting the `@`s will lead you to believe that `ne Doe` is the domain.
Welbog
Your regexp snippet "[^"]*" does not correctly match a quoted string, since a quoted string may contain escaped quote characters. For instance, "contains \"quotes\"" <[email protected]> is a valid address. It would be better with "(?:[^"]|\\.)*".
markusk
@markusk: This is exactly why I wouldn't use a regular expression in this situation. I'm just going to pull it down. People who want to see it can look in the revision history.
Welbog
+1  A: 

I'd go with $author = str_replace(strrchr($authorpre, '@'), '', $authorpre);

Arkh
+1  A: 

You could start by using mailparse_rfc822_parse_addresses to parse the address and extract just the address specification without any display name. Then, you could extract the part before @ with the regexp (.*)@.

markusk
+1  A: 

@OP, if you only want to get everything before @, just use string/array methods. No need complicated regex. Explode on "@", then remove the last element which is the domain part

$str = '"peter@john@doe"@domain.com (John Doe)';
$s = explode("@",$str);
array_pop($s); #remove last element.
$s = implode("@",$s);
print $s;

output

$ php test.php
"peter@john@doe"
ghostdog74