On character class
Your pattern contains this subpattern:
[\s\w]*
The […]
is a character class. Something like [aeiou]
matches one of any of the lowercase vowels. [^…]
is a negated character class. [^aeiou]
matches one of anything but the lowercase vowels.
\s
is the shorthand for whitespace character class; \w
for word character class. Neither contains the hyphen.
The *
is the zero-or-more repetition specifier.
Now you should understand why this pattern does not match a hyphen: it matches zero-or-more of characters that is either a whitespace or a word character. If you want to match a hyphen, then you can include it into the character class.
[\s\w-]*
If you also want to include the period, question mark, and exclamation mark, for example, then you can simply add them in as well:
[\s\w.!?-]*
Special note on hyphen
BE CAUTIOUS when including the hyphen in a character class. It is used as a regex metacharacter in character class definition to define character range. For example,
[a-z]
matches one of any character the range between 'a'
and 'z'
, inclusive. By contrast,
[az-]
matches one of exactly 3 characters, 'a'
, 'z'
, and '-'
. When you put -
as the last element in a character class, it becomes a literal hyphen instead of range definition. You can also put it as the first element, or escape it (by preceding with backslash, which is the way you escape all other regex metacharacters too).
That is, the following 3 character class are identical:
[az-] [-az] [a\-z]
Related questions