Simple Regex:
\w+
This matches a string of "word" characters. That is almost what you want.
This is slightly more accurate:
\w(?<!\d)[\w'-]*
It matches any number of word characters, ensuring that the first character was not a digit.
Here are my matches:
1 LOLOLOL
2 YOU'VE
3 BEEN
4 PWN3D
5 einszwei
6 drei
Now, that's more like it.
EDIT:
The reason for the negative look-behind, is that some regex flavors support Unicode characters. Using [a-zA-Z] would miss quite a few "word" characters that are desirable. Allowing \w
and disallowing \d
includes all Unicode characters that would conceivably start a word in any block of text.
EDIT 2:
I have found a more concise way to get the effect of the negative lookbehind: Double negative character class with a single negative exclusion.
[^\W\d][\w'-]*(?<=\w)
This is the same as the above with the exception that it also ensures that the word ends with a word character. And, finally, there is:
[^\W\d](\w|[-']{1,2}(?=\w))*
Ensuring that there are no more than two non-word-characters in a row. Aka, It matches "word-up" but not "word--up", which makes sense. If you want it to match "word--up", but not "word---up", you can change the 2
to a 3
.