views:

658

answers:

3

Please help

I am working with a file whose lines of data look like the one below. As can be seen, the data is divided into 4 by '|||', so I will have four arrays( if I divide it). what I want is this:

  1. I want to check if there are punctuation marks in the first array, if there is one, remember the position in the array.
  2. Go to the same position in the third array, and read the number in bracket.
  3. Check if the value at the array index of the number is a punctuation.

My problem is, I could not remember the match, and its position! Can you help here please?

útil por la unión europea , a ||| by the european union , ||| () (0) (1) (3) (2) (4) () ||| (1) (2) (4) (3) (5)
+2  A: 

The pos() function can be used to report the (ending) position of a match. Example:

my $string = 'abcdefghijk';

if($string =~ /e/g)
{
  print "There is an 'e' ending at position ", pos($string), ".\n";
}

This code will print, "There is an 'e' ending at position 5." (Positions start from 0.) Combine this with the normal use of capturing parentheses and you should be able to solve your problem.

In addition to pos(), there are also the special global arrays @- and @+ which provide the start and end offsets of each subpattern matched. Example:

my $string = 'foo bar baz';

if($string =~ /(foo) (bar) (baz)/)
{
  print "The whole match is between $-[0] and $+[0].\n",
        "The first match is between $-[1] and $+[1].\n",
        "The second match is between $-[2] and $+[2].\n",
        "The third match is between $-[3] and $+[3].\n";
}

( Thanks to Chas. Owens for jogging my memory on these; I was looking in perlre for them instead of in perlvar )

John Siracusa
pos returns the position of the end of the string, you need pos($string) - the_length_of_the_match (in this case pos($string) - 1).
Chas. Owens
Thanks, clarified.
John Siracusa
+4  A: 

In addition to pos(), there are @- and @+:

#!/usr/bin/perl

use strict;
use warnings;

my $string = "foo bar baz";

if ($string =~ /(foo) (bar) (baz)/) {
    print "the whole match is between $-[0] and $+[0]\n",
     "the first match is between $-[1] and $+[1]\n",
     "the second match is between $-[2] and $+[2]\n",
     "the third match is between $-[3] and $+[3]\n";
}
Chas. Owens
+1  A: 

When you have something to do something in code that isn't simple, it's best to break it down into discrete steps and variables so that it is easy to understand.

So I would first split the data string into it's four parts:

#The data record
my $dataRec = "útil por la unión europea , a ||| by the european union , ||| () (0) (1) (3) (2) (4) () ||| (1) (2) (4) (3) (5)";

#split it into four parts
my ($Native, $English, $data1, $data2) = split(/\|\|\|/,$dataRec);

#Store the position of the punctuation mark
my $puncPos = index($Native, ",");

#If we found the punctuation mark, parse the data
my @dataList;
my $dataValue;
if ( $puncPos != -1 )
   {
   @dataList = split(/[)( ]/,$data1);

   # use the punctuation position as the index into the array of values parsed
   $dataValue = $dataList[$puncPos];
   }

Something like that ...

Ron

Ron Savage