views:

159

answers:

3

I have a string like this:

a  b   c       d

I process my string like this:

   chomp $line;
    my @tokens = split /\s+/, $line;
    my @new_tokens;
    foreach my $token (@tokens) {    
        push @new_tokens, some_complex_function( $token );
    }
    my $new_str = join ' ', @tokens;

I'd like to re-join the string with the original whitespace. Is there some way that I can store the whitespace from split and re-use it later? Or is this going to be a huge pain? It's mostly cosmetic, but I'd like to preserve the original spaces from the input string.

+4  A: 

Just split on word boundaries:

split /\b/, $line;

For your example, this will give:

('a','  ','b','   ','c','       ','d')

EDIT: As brian d foy pointed out, \b uses the wrong character classes, Following my original idea, I came up with using look-around assertions. This looks way more complicated than Ether's answer, though:

split /(?:(?<=\S)(?=\s)|(?<=\s)(?=\S))/, $line;
hillu
This may accidently split on things that aren't whitespace boundaries.
brian d foy
Thanks for pointing this out! I wrote the original answer without thinking about `\w` vs. `\s`. Edited my answer accordingly.
hillu
+16  A: 

If you split with a regex with capturing parentheses, the split pattern will be included in the resulting list (see perldoc -f split):

my @list = split /(\s+)/, 'a  b   c       d';
print Data::Dumper::Dumper(\@list);

VAR1 = [
          'a',
          '  ',
          'b',
          '   ',
          'c',
          '       ',
          'd'
        ];
Ether
+3  A: 

Why don't you simply do: my $new_str = uc( $line ); ?

UPDATE - original uc() is just a shorthand for "more complex function".

Well, generally you can also:

$line =~ s/(\S+)/more_complex_function($1)/ge;
depesz
Because my real case is more complicated, and this is just an example.
James Thompson