ansaurus

Question

How can I split a string by whitespace unless inside of a single quoted string?

Answer 1

+13 A:

#!/usr/bin/perl

use strict; use warnings;
use Text::ParseWords;

my @words = parse_line('\s+', 0, "abcd efgh 'ijklm no pqrs' tuv");

use Data::Dumper;
print Dumper \@words;

Output:

C:\Temp> ff
$VAR1 = [
          'abcd',
          'efgh',
          'ijklm no pqrs',
          'tuv'
        ];

You can look at the source code for Text::ParseWords::parse_line to see the pattern used.

Sinan Ünür 2010-03-17 03:31:06

I love how "how do I do this?" question I have ever had about Perl has been quickly answered by "Use this module that does exactly what you want."

Jergason 2010-03-17 03:40:03

Figures there is a package to do exactly what I need. I wasn't sure what I was looking for. You're a rock star, thanks!

Kivin 2010-03-17 04:24:02

@Jergason blame it on the wonderful people who, when they *don't* find exactly what they need, and have to write it themselves, CPAN the result afterwards. :)

hobbs 2010-03-17 04:36:23

And then blame the wonderful people who write CPAN modules that use every other possible CPAN module, no matter how tiny, so that you must pull in ten other mostly-useless modules.

Zan Lynx 2010-03-17 23:31:32

@zan FWIW, `Text::ParseWords` is in the core. Also, modules or distributions with giant dependency lists are not that common.

Sinan Ünür 2010-03-18 00:11:46

Answer 2

+2 A:

So you've decided to use a regex? Now you have two problems.

Allow me to infer a little bit. You want an arbitrary number of fields, where a field is composed of text without containing a space, or it is separated by spaces and begins with a quote and ends with a quote (possibly with spaces inbetween).

In other words, you want to do what a command line shell does. You really should just reuse something. Failing that, you should capture a field at a time, with a regex something like:

^ *([^ ]+|'[^']*')(.*)

Where you append group one to your list, and continue the loop with the contents of group 2.

A single pass through a regex wouldn't be able to capture an arbitrarily large number of fields. You might be able to split on a regex (python will do this, not sure about perl), but since you are matching the stuff outside the spaces, I'm not sure that is even an option.

Mark Santesson 2010-03-17 03:41:28

Answer 3

+3 A:

use strict; use warnings;

my $text = "abcd efgh 'ijklm no pqrs' tuv 'xwyz 1234 9999' 'blah'";
my @out;

my @parts = split /'/, $text;

for ( my $i = 1; $i < $#parts; $i += 2 ) {
    push @out, split( /\s+/, $parts[$i - 1] ), $parts[$i];
}

push @out, $parts[-1];

use Data::Dumper;
print Dumper \@out;

2010-03-17 04:03:03

ansaurus

tags:

views:

answers:

How can I split a string by whitespace unless inside of a single quoted string?

related questions