views:

84

answers:

2

I have a long regular expression that parses a text file into various match variables.

For robustness, the match variables are likely to contain white space. I'd like to remove the whitespace in a systematic way by iterating over the match variables.

For example, I have match variables $2 through $14 that contain some whitespace.

I could do:

my @columns = my ($serNum, $helixID, $initResName, $initChainID,
$initSeqNum, $initIcode, $endResName, $endChainID, $endSeqNum,
$endICode, $helixClass, $comment, $length) = 
($2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14);

### Remove whitespace                       
foreach my $element (0..$#columns) {
    $columns[$element] =~ s/^\s+//;
    $columns[$element] =~ s/\s+$//;
}

But this only removes the white space in the elements in @column, and leaves the properly named scalars, $serNum, $helixID, etc., untouched.

Is there a way to remove the white space in each of the match variables before I copy them to more well-named scalars, or is there a way to iterate over these well-named scalars themselves and remove the whitespace from there?

I presume there might be some way to do this with references.

+3  A: 

You can store the match variables in array first, then strip whitespace using map:

my @matches = ($2, $3, $4, ...);

my ($serNum, $helixID, ...) 
  = map { (my $v = $_) =~ s/^\s+|\s+$//g; $v } @matches;
eugene y
eugene, this is brilliant!! Thanks for suggesting the mighty map.Although you're right, I could modify the regex to capture data w/o whitespace, I wanted to use the . metacharacter to avoid putting any constrains on what kinds of characters I'd be capturing. It seemed simpler to capture everything and just remove the preceeding and trailing whitespace than to list all possibilities including data that may properly contains whitespace within it.
CmdrGuard
Or `my ($serNum, $helixID, ... ) = grep { s#^\s*|\s*$##g } @matches;` for variety.
Zaid
+1  A: 

It's refreshing to see a good level of detail in questions! It enables the community to address the problem in a much better fashion.

What I would do is migrate away from the 'well-named' array of elements to a hash. This is cleaner and has the potential to reduce the number of variables needed in code.

my @matches = $data =~ m{$regex};   # Populates @matches with ( $1, $2, $3, ..)
my @labels  = qw/serNum helixID initResName .../;   # Create labels

my %record;                                 # Initialize hash
@record{@labels} = grep { s!^\s*|\s*$!!g }  # Strips out leading/trailing spaces
                   @matches[1..$#matches];  # Populate %record with array slice
                                            # Array slice of @matches needed to 
                                            # ignore the $1

# Now data can be accessed as follows:
print $record{helixID};                     # Prints the helix ID in the record

The grep part may need some explaining. It's a fancy way of avoiding having to lexically copy each string inside a map call.

By it's nature, grep filters arrays. This is why the whitespace-stripping regex had to be modified from \s+ to \s*, ensuring that the regex is always matched, and so no items are filtered out.

Zaid