views:

2497

answers:

12

Is there a way to do this in one line?

$x =~ s/^\s+//;
$x =~ s/\s+$//;

In other words, remove all leading and trailing whitespace from a string.

+4  A: 

Here you go: $x =~ s/\A\s*(.*?)\s*\z/$1/;

moritz
+1  A: 

$x =~ s/(^\s+)|(\s+$)//g;

Lou Franco
that's how i always do it.. seems by far the easiest.
Kip
yeah, and it says what it means -- replace beginning white space OR ending white space with nothing, globally.
Lou Franco
Capturing parens aren't used or needed -- generally you can replace them with grouping parens (?:...), but in this case precedence works out nicely and you can remove the parentheses altogether.
ephemient
That's how I do that, without the capturing, as ephemient already pointed out.
PhiLho
Using a single regex is much slower than using two regexes. See Tanktalus' benchmarks. This can become important if you're doing a lot of trimming.
Michael Carman
A: 
$x =~ s/^\s*(.*?)\s*$/$1/;
Lev
Using \1 instead of $1 in the replacement string is discouraged in Perl, see "Warning on \1 vs $1" in `perldoc perlre`.
ephemient
Since the quantifiers are greedy, you don't need to say something like [^\s] after matching \s+. Also, instead of [^\s], you can just say \S. The capitalized versions are the complemented character class. :)
brian d foy
brian: Yes, but only if I make the .* ungreedy.
Lev
A: 

Or this: s/\A\s*|\s*\Z//g

ephemient
A: 
s/^\s*(\S*\S)\s*$/$1/
Zsolt Botykai
Using \1 instead of $1 in the replacement string is discouraged in Perl, see "Warning on \1 vs $1" in `perldoc perlre`.
ephemient
You are right, I'll correct it.
Zsolt Botykai
The problem here is you require at least 2 non-whitespace characters in the string, or it won't work.
bart
Yep, I'll correct it again!
Zsolt Botykai
+17  A: 
$x =~ s/^\s+|\s+$//g;

or

s/^\s+//, s/\s+$// for $x;
runrig
Option 2: That's a nice trick, but doesn't really answer the question :D
ephemient
How does that not answer the question? It's trimming from both sides without the performance sucking alternation of the single regex.
brian d foy
s/^\s*(.*?)\s*/\1/; has to try more alternatives than either of those two options
Brad Gilbert
+23  A: 

My first question is ... why? I don't see any of the single-regexp solutions to be any more readable than the regexp you started with. And they sure aren't anywhere near as fast.

#!/usr/bin/perl

use strict;
use warnings;

use Benchmark qw(:all);

my $a = 'a' x 1_000;

my @x = (
         "    $a   ",
         "$a   ",
         $a,
         "    $a"
        );

cmpthese(-5,
         {
             single => sub {
                 for my $s (@x)
                 {
                     my $x = $s;
                     $x =~ s/^\s+|\s+$//g;
                 }
             },
             double => sub {
                 for my $s (@x)
                 {
                     my $x = $s;
                     $x =~ s/^\s+//;
                     $x =~ s/\s+$//;
                 }
             },
             trick => sub {
                 for my $s (@x)
                 {
                     my $x = $s;
                     s/^\s+//, s/\s+$// for $x;
                 }
             },
             capture => sub {
                 for my $s (@x)
                 {
                     my $x = $s;
                     $x =~ s/\A\s*(.*?)\s*\z/$1/
                 }
             },
             kramercap => sub {
                 for my $s (@x)
                 {
                     my $x = $s;
                     ($x) = $x =~ /^\s*(.*?)\s*$/
                 }
             },
         }
        );

gives results on my machine of:

             Rate    single   capture kramercap     trick    double
single     2541/s        --      -12%      -13%      -96%      -96%
capture    2902/s       14%        --       -0%      -95%      -96%
kramercap  2911/s       15%        0%        --      -95%      -96%
trick     60381/s     2276%     1981%     1974%        --       -7%
double    65162/s     2464%     2145%     2138%        8%        --

Edit: runrig is right, but to little change. I've updated the code to copy the string before modification, which, of course, slows things down. I also took into account brian d foy's suggestion in another answer to use a longer string (though a million seemed like overkill). However, that also suggests that before you choose the trick style, you figure out what your string lengths are like - the advantages of trick are lessened with shorter strings. At all lengths I've tested, though, double wins. And it's still easier on the eyes.

Tanktalus
You're assuming that he's doing this in Perl, and that might not be the case. "Perl-compatible" always raises red flags for me.
Andy Lester
True - it's a bit confusing to see both perl and pcre tags...
Tanktalus
All of your "tests" will change @x on the first iteration. So none is testing what you think. You need to copy @x in your subs. And in the double solution, don't wrap it in a for loop, just use "for @x".
runrig
Sorry, I meant "in the trick solution", don't wrap, use "for @x" (but after fix to copy to a temp array first).
runrig
Even in PCRE it's still possible to do it in 2 steps, for most applications.
bart
+5  A: 

Arguing from the heretical, why do it at all? All of the above solutions are "correct" in that they trim whitespace from both sides of the string in one pass, but none are terribly readable (expect maybe this one). Unless the audience for your code is comprised of expert-level Perl coders each of the above candidates should have a comment describing what they do (probably a good idea anyway). By contrast, these two lines accomplish the same thing without using lookaheads, wildcards, midichlorines or anything that isn't immediately obvious to a programmer of moderate experience:

$string =~ s/^\s+//;
$string =~ s/\s+$//;

There is (arguably) a performance hit, but as long as you aren't concerned with a few microseconds at execution the added readability will be worth it. IMHO.

Logan
Performance hit? Who could argue that? It's more than twice as fast as any other solution listed.
Tanktalus
Fair enough, I didn't benchmark the code because I wanted to get out the door for a (very) late lunch. Glad to know there's no performance hit.
Logan
Perl expert? People in my Learning Perl course would understand all of these solutions by the end of the second day.
brian d foy
+7  A: 

Funny you should bring this up!

I recently read an article analyzing the performance of twelve (!) different trim implementations.

Although the article specifically uses the JavaScript regex implementation, it uses Perl syntax, so I think it's apropos to this discussion.

benjismith
+6  A: 

Tanktalus shows a benchmark for very small strings, but the problems get worse as the strings get bigger. In his code, I changed the top portion:

my $a = 'a' x 1_000_000;

my @x = (
  "   $a   ",
  "$a    ",
  $a,
  "    $a"
  );

I get these results:

          Rate  single capture   trick  double
single  2.09/s      --    -12%    -98%    -98%
capture 2.37/s     13%      --    -98%    -98%
trick   96.0/s   4491%   3948%      --     -0%
double  96.4/s   4512%   3967%      0%      --

As the string gets bigger, using "trick" and "double" are almost the same, and the common solution that most people go for, the "single" (including me, because I can't break that habit even though I know this), really starts to suck.

Whenever you look at a benchmark, think about what it's telling you. To see if you understand it, change the data and try again. Make arrays long, scalars big, and so on. Make loops, greps, or regexes find stuff at the start, middle, and end. See if the new results match your prediction. Figure out what the trend is. Does performance get better and better, approach a limit, peak then start to decline, or something else?

brian d foy
+1  A: 

I usually do it like this:

($foo) = $foo =~ /^\s*(.*?)\s*$/;

Everything between the leading spaces and the trailing spaces is grouped and returned, so I can assign it to the same old variable.

jkramer
A: 

$var1 =~ s/(^\s*)(.?)(\s$)+/$2/;

Shashidhar Vajramatti