I sometimes want to match whitespace but not newline. So far I've been resorting to [ \t]
. Is there a less awkward way?
views:
109answers:
1
+10
A:
Use a double-negative:
/[^\S\n]/
That is, not-not-whitespace or not-newline. Distributing the outer not (i.e., the complementing ^
in the character class) with De Morgan's law, this is equivalent to “whitespace and not newline,” but don't take my word for it:
#! /usr/bin/perl
use warnings;
use strict;
for (' ', '\f', '\n', '\r', '\t') {
my $qq = qq["$_"];
printf "%-4s => %s\n", $qq, (eval $qq) =~ /[^\S\n]/ ? "match" : "no match";
}
Output:
" " => match "\f" => match "\n" => no match "\r" => match "\t" => match
This trick is also handy for matching alphabetic characters. Remember that \w
matches “word characters,” alphabetic characters but also digits and underscore. We ugly-Americans sometimes want to write it as, say,
if (/^[A-Za-z]+$/) { ... }
but a double-negative character-class can respect the locale:
if (/^[^\W\d_]+$/) { ... }
That is a bit opaque, so a POSIX character-class may be better at expressing the intent
if (/^[[:alpha:]]+$/) { ... }
or as szbalint suggested
if (/^\p{Letter}+$/) { ... }
Greg Bacon
2010-08-12 15:07:59
Just a short note, for Unicode aware matching `/\p{Letter}/` can also be used. It includes letters, but not numbers.
szbalint
2010-08-12 15:45:02
Clever, but the behavior is very surprising, and I don't see how it's less awkward.
Qwertie
2010-08-12 16:04:12
@Qwertie: what's surprising? Less awkward than what?
ysth
2010-08-12 16:06:56