ansaurus

Question

Splitting a String into Tokens and Storing the Delimiters in Perl

Answer 1

+4 A:

Just split on word boundaries:

split /\b/, $line;

For your example, this will give:

('a','  ','b','   ','c','       ','d')

EDIT: As brian d foy pointed out, \b uses the wrong character classes, Following my original idea, I came up with using look-around assertions. This looks way more complicated than Ether's answer, though:

split /(?:(?<=\S)(?=\s)|(?<=\s)(?=\S))/, $line;

hillu 2009-12-14 07:41:30

This may accidently split on things that aren't whitespace boundaries.

brian d foy 2009-12-14 09:04:11

Thanks for pointing this out! I wrote the original answer without thinking about `\w` vs. `\s`. Edited my answer accordingly.

hillu 2009-12-14 12:29:36

Answer 2

+16 A:

If you split with a regex with capturing parentheses, the split pattern will be included in the resulting list (see perldoc -f split):

my @list = split /(\s+)/, 'a  b   c       d';
print Data::Dumper::Dumper(\@list);

VAR1 = [
          'a',
          '  ',
          'b',
          '   ',
          'c',
          '       ',
          'd'
        ];

Ether 2009-12-14 07:47:50

Answer 3

+3 A:

Why don't you simply do: my $new_str = uc( $line ); ?

UPDATE - original uc() is just a shorthand for "more complex function".

Well, generally you can also:

$line =~ s/(\S+)/more_complex_function($1)/ge;

depesz 2009-12-14 07:50:16

Because my real case is more complicated, and this is just an example.

James Thompson 2009-12-14 08:36:51

ansaurus

tags:

views:

answers:

Splitting a String into Tokens and Storing the Delimiters in Perl

related questions