tags:

views:

109

answers:

1

I sometimes want to match whitespace but not newline. So far I've been resorting to [ \t] . Is there a less awkward way?

+10  A: 

Use a double-negative:

/[^\S\n]/

That is, not-not-whitespace or not-newline. Distributing the outer not (i.e., the complementing ^ in the character class) with De Morgan's law, this is equivalent to “whitespace and not newline,” but don't take my word for it:

#! /usr/bin/perl

use warnings;
use strict;

for (' ', '\f', '\n', '\r', '\t') {
  my $qq = qq["$_"];
  printf "%-4s => %s\n", $qq, (eval $qq) =~ /[^\S\n]/ ? "match" : "no match";
}

Output:

" "  => match
"\f" => match
"\n" => no match
"\r" => match
"\t" => match

This trick is also handy for matching alphabetic characters. Remember that \w matches “word characters,” alphabetic characters but also digits and underscore. We ugly-Americans sometimes want to write it as, say,

if (/^[A-Za-z]+$/) { ... }

but a double-negative character-class can respect the locale:

if (/^[^\W\d_]+$/) { ... }

That is a bit opaque, so a POSIX character-class may be better at expressing the intent

if (/^[[:alpha:]]+$/) { ... }

or as szbalint suggested

if (/^\p{Letter}+$/) { ... }
Greg Bacon
Just a short note, for Unicode aware matching `/\p{Letter}/` can also be used. It includes letters, but not numbers.
szbalint
Clever, but the behavior is very surprising, and I don't see how it's less awkward.
Qwertie
@Qwertie: what's surprising? Less awkward than what?
ysth