ansaurus

Question

Answer 1

A:

Cooked something up :). Does work only for you example. Cannot generalize

use strict;
use warnings;

my $info = "Caine Michael Actor /* info data */";
if($info=~m{/\*\s*(.*?)\s*\*/})
{
    my $temp = $1;
    $temp=~s{\s+}{##}g;
    $info=~s{/\*\s*(.*?)\s*\*/}{$temp};
}
my @personal = split(/ /, $info);
foreach(@personal)
{
    s{##}{ }g;
    print "$_\n";
}

Output:

C:>perl a.pl
Caine
Michael
Actor
info data

codaddict 2010-01-11 07:39:16

@codadict, Thanks a lot for your details reply. I found it's a solution for my case. It's MAGIC.

Nano HE 2010-01-11 09:43:55

Answer 2

+2 A:

It really is better to use regex for this:

$info = "Caine Michael Actor /* info data */";
$info =~ /(\w+)\s+(\w+)\s+(\w+).*\/\*(.+)\*\//;
@personal = ($1, $2, $3, $4);

Mainly because your input string has ambiguities related to word separators not easily handled by split.

In case you're wondering how to read the regex:

/
    (\w+)   # CAPTURE a sequence of one of more word characters into $1
    \s+     # MATCH one or more white space
    (\w+)   # CAPTURE a sequence of one of more word characters into $2
    \s+     # MATCH one or more white space
    (\w+)   # CAPTURE a sequence of one of more word characters into $3
    .*      # MATCH zero or more of anything
    \/\*    # MATCH the opening of C-like comment /*
    (.+)    # CAPTURE a sequence of one or more of anything into $4
    \*\/    # MATCH the closing of C-like comment */
/x

slebetman 2010-01-11 08:02:29

Avoid the leaning toothpick syndrome by using a different delimiter and assign the match to `@personal`. Don't forget to check if `@personal` was populated. `if ( @personal =~ m!...! )`. You should also anchor the pattern.

Sinan Ünür 2010-01-11 10:26:50

You don't really want to match \w+ there. You don't care what the characters are as long as they aren't whitespace (that is, you don't care if they are Perl identifier characters), so you should match \S+

brian d foy 2010-01-11 10:59:40

Better would be `if (@personal = $info =~ /.../) { ... }`. **Never use `$1` and friends unconditionally!**

Greg Bacon 2010-01-11 14:17:18

Answer 3

+4 A:

Alternative approach:

Have you considered using the 3-parameter version of split:

$info = "Caine Michael Actor /* info data */";
@personal= split(' ',$info,4);

resulting in

@personal=('Caine','Michael','Actor','/* info data */');

then you would have to remove / * * / .. to get your result...

lexu 2010-01-11 08:12:47

sigh, I can't get the slash-asterisk and asterisk-slash to show up..

lexu 2010-01-11 08:14:29

Hi Lexu, Thank you for your reply. I never considered using the 3-parameter version of split before. You teached me more about split().

Nano HE 2010-01-11 09:53:17

Answer 4

+1 A:

since there isn't an answer yet that handles the general case, here goes:

split isn't your best bet here, and since the delimiter can be both a matched and non matched character, it will be clearest to invert the problem and describe what you do what to match, which in this case is either a string of non space characters, or the contents of a c style comment.

use strict;
use warnings;

my $info = "Caine Michael Actor /* info data */";
my @personal = grep {defined} $info =~ m! /\* \s* (.+?) \s* \*/ | (\S+) !xg;

say join ', ' => @personal;

that will return a list of words / contents of comments in any sequence you need. The syntax highlighter doesn't highlight the above regex properly, the regex is everything between !

Eric Strom 2010-01-12 03:08:21

ansaurus

tags:

views:

answers:

split function extension

related questions