tags:

views:

127

answers:

4

I'm trying to find the regular expression to find just the alphanumeric words from a string i.e the words that are a combination of alphabets or numbers. If a word is pure numbers or pure characters I need to discard it.

A: 

This will return all individual alphanumeric words, which you can loop through. I don't think regex can do the whole job by itself.

\b[a-z0-9]+\b

Make sure you mark that as case-insensitive.

Matchu
this will match `abc` and `123`
Rubens Farias
I'll try another round of regex-writing, but you can test this manually in the loop.
Matchu
I don't want it to match abc and 123 just a1bc3 or 1bac3.
manny
Now throwing my support behind Gumbo's answer.
Matchu
+4  A: 

Try this regular expression:

\b([a-z]+[0-9]+[a-z0-9]*|[0-9]+[a-z]+[a-z0-9]*)\b

Or more compact:

\b([a-z]+[0-9]+|[0-9]+[a-z]+)[a-z0-9]*\b

This matches all words (note the word boundaries \b) that either start with one or more letters followed by one or more digits or vice versa that may be followed by one or more letters or digits. So the condition of at least one letter and at least one digit is always fulfilled.

Gumbo
Or `\b([a-z]+[0-9]+|[0-9]+[a-z]+)\w*\b` for even more compactness.
tj111
@tj111: `\w` is not just `[A-Za-z0-9]` by definition. Often there are more characters in it like `_` or other word character that are not in ASCII.
Gumbo
It's only works if the leading character is a number. Mark's expression works fine.
manny
@manny: It *definitely* works for words that start with a letter too. That’s what the first branch in the alternation is for.
Gumbo
Just checked it again and in fact it does work.
manny
+2  A: 

With lookaheads:

'/\b(?![0-9]+\b)(?![a-z]+\b)[0-9a-z]+\b/i'

A quick test that also shows example usage:

$str = 'foo bar F0O 8ar';
$arr = array();
preg_match_all('/\b(?![0-9]+\b)(?![a-z]+\b)[0-9a-z]+\b/i', $str, $arr);
print_r($arr);

Output:

F0O
8ar
Mark Byers
A: 
\b(?:[a-z]+[0-9]+|[0-9]+[a-z]+)[[:alnum:]]*\b
Alix Axel