ansaurus

Question

Match whitespace but not newlines (Perl)

Answer 1

+10 A:

Use a double-negative:

/[^\S\n]/

That is, not-not-whitespace or not-newline. Distributing the outer not (i.e., the complementing ^ in the character class) with De Morgan's law, this is equivalent to “whitespace and not newline,” but don't take my word for it:

#! /usr/bin/perl

use warnings;
use strict;

for (' ', '\f', '\n', '\r', '\t') {
  my $qq = qq["$_"];
  printf "%-4s => %s\n", $qq, (eval $qq) =~ /[^\S\n]/ ? "match" : "no match";
}

Output:

" "  => match
"\f" => match
"\n" => no match
"\r" => match
"\t" => match

This trick is also handy for matching alphabetic characters. Remember that \w matches “word characters,” alphabetic characters but also digits and underscore. We ugly-Americans sometimes want to write it as, say,

if (/^[A-Za-z]+$/) { ... }

but a double-negative character-class can respect the locale:

if (/^[^\W\d_]+$/) { ... }

That is a bit opaque, so a POSIX character-class may be better at expressing the intent

if (/^[[:alpha:]]+$/) { ... }

or as szbalint suggested

if (/^\p{Letter}+$/) { ... }

Greg Bacon 2010-08-12 15:07:59

Just a short note, for Unicode aware matching `/\p{Letter}/` can also be used. It includes letters, but not numbers.

szbalint 2010-08-12 15:45:02

Clever, but the behavior is very surprising, and I don't see how it's less awkward.

Qwertie 2010-08-12 16:04:12

@Qwertie: what's surprising? Less awkward than what?

ysth 2010-08-12 16:06:56

ansaurus

tags:

views:

answers:

Match whitespace but not newlines (Perl)

related questions