I'm trying to find the regular expression to find just the alphanumeric words from a string i.e the words that are a combination of alphabets or numbers. If a word is pure numbers or pure characters I need to discard it.
A:
This will return all individual alphanumeric words, which you can loop through. I don't think regex can do the whole job by itself.
\b[a-z0-9]+\b
Make sure you mark that as case-insensitive.
Matchu
2010-01-14 18:36:11
this will match `abc` and `123`
Rubens Farias
2010-01-14 18:37:36
I'll try another round of regex-writing, but you can test this manually in the loop.
Matchu
2010-01-14 18:40:01
I don't want it to match abc and 123 just a1bc3 or 1bac3.
manny
2010-01-14 18:40:07
Now throwing my support behind Gumbo's answer.
Matchu
2010-01-14 18:54:22
+4
A:
Try this regular expression:
\b([a-z]+[0-9]+[a-z0-9]*|[0-9]+[a-z]+[a-z0-9]*)\b
Or more compact:
\b([a-z]+[0-9]+|[0-9]+[a-z]+)[a-z0-9]*\b
This matches all words (note the word boundaries \b
) that either start with one or more letters followed by one or more digits or vice versa that may be followed by one or more letters or digits. So the condition of at least one letter and at least one digit is always fulfilled.
Gumbo
2010-01-14 18:39:55
@tj111: `\w` is not just `[A-Za-z0-9]` by definition. Often there are more characters in it like `_` or other word character that are not in ASCII.
Gumbo
2010-01-14 18:47:36
It's only works if the leading character is a number. Mark's expression works fine.
manny
2010-01-14 19:03:55
@manny: It *definitely* works for words that start with a letter too. That’s what the first branch in the alternation is for.
Gumbo
2010-01-14 19:08:09
+2
A:
With lookaheads:
'/\b(?![0-9]+\b)(?![a-z]+\b)[0-9a-z]+\b/i'
A quick test that also shows example usage:
$str = 'foo bar F0O 8ar';
$arr = array();
preg_match_all('/\b(?![0-9]+\b)(?![a-z]+\b)[0-9a-z]+\b/i', $str, $arr);
print_r($arr);
Output:
F0O
8ar
Mark Byers
2010-01-14 18:42:41