ansaurus

Question

[Perl]: Read directory and files, and regex

Answer 1

A:

If I have understood correctly, you simply need to read a file, and find two values. These values are the series of digits after the word 'fin' and after the word 'debut'. Right now, you are trying to match on these by looking for something that occurs before the string you are interested in. Perhaps you should be looking for the actual information of interest.

In a regular expression, it is almost always better to look for interesting text rather than try to skip non-interesting text. Something like the following will work better.

Note, that I've changed your file read because you were reading into a variable then processing $_ which is (almost definitely) not what you meant.

while (my $line = <FILE>) #read each line from FILE.
{
    chomp ($line);

    # These two lines could be combined but this is a little clearer.
    # Matching against [0-9] because \d matches all unicode digits.
    my ($fin_digits) = $line =~ /fin\s+([0-9]+)/;   
    my ($debut_digits) = $line =~ /debut\s+([0-9]+)/; # as above.

    # Continue processing below...
}

Now, one difference is that your example data shows multiple occurrences of fin and debut in one line. If that is that case, you will need a slightly different regular expression. Let us all know if that really is the case.

UPDATE

Given that you do actually have matching pairs on the same line you might want to use something like the following. Again, I've only put in the regular expression matching and not the processing code. This code actually allows for an arbitrary number of pairs on a single line.

while (my $line = <FILE>) #read each line from FILE.
{
    chomp ($line);

    # These two lines could be combined but this is a little clearer.
    # Matching against [0-9] because \d matches all unicode digits.
    # In list context, m// returns the matches in order, the /g modifier
    # makes this a global match - in a loop this means each pair of
    # matches will be returned in order.
    while (my ($debut, $fin) =~ /debut\s+([0-9]+).+?fin\s+([0-9]+)/g)
    {
           # result processing here.
    }


}

Nic Gibson 2009-05-12 10:28:28

Answer 2

+4 A:

You're making this way too complicated by trying to count fields and calculate offsets in the line and so forth. Assuming you're looking for matched debut/fin pairs, you can use

#!/usr/bin/perl

use strict;
use warnings;

my @elements;
while (<DATA>) {
  my $line = $_;
  push @elements, $line =~ /debut (\d+).*?fin (\d+)/g;
}

print join ',', @elements;
print "\n";
__DATA__
(champs1 (champs6 donnee_o donnee_f) [(champs2 [] (champs3 _YOJNJeyyyyyyB (champs4 donnee_x)) (debut 144825 25345) (fin 244102 40647)), (champs2 [] (champs3 _FuGNJeyyyyyyB (champs4 donnee_z)) (debut 796443 190570) (fin 145247 42663))] [] [])

This code generates the output

144825,244102,796443,145247

($line isn't even really needed, since m// operates on $_ by default, but I left that in there in case you actually need to do other processing on it. And push @elements, /debut (\d+).*?fin (\d+)/g; is a little more obfuscated than I feel is appropriate here.)

If you're not concerned with matching pairs, you can also use two separate arrays and replace the push line with

push @debuts, $line =~ /debut (\d+)/g;
push @fins, $line =~ /fin (\d+)/g;

Dave Sherohman 2009-05-12 10:39:39

Answer 3

A:

2009-05-12 11:55:53

I've updated my answer.

Nic Gibson 2009-05-12 12:18:24

Answer 4

A:

Hi all,

thank you to Newt and Dave Sherohman. I've fixed all errors, but i've no output on screen. Can anyone know what is going wrong please?

#!/usr/bin/perl -w
use strict;
use warnings;
use diagnostics;
use CGI::Carp 'fatalsToBrowser';

my $dir = './Chemin/Fichier/';
my $meanOfLenghts1;
my $meanOfLenghts2;
my $sum2;
my $sum1;
my $file;
my $i;
my @elements;
my @elements1;
my @elements2;
my @length;
#my $length;
my @listeFichiersMem;
my $debut;
my $fin;
my $line;
my $result;
my $value;

# read all entries in the directory:
opendir DIR, $dir or die "Cannot open $dir $!";
@listeFichiersMem = grep /\.txt$/, readdir DIR;
foreach $file (@listeFichiersMem) 
{
    $i = 0;
    open FILE, $file or die $!;
    print $file . "\n";

    while ($line = <FILE>) #read each line from FILE.
    {
        chomp ($line);
        while ((($debut, $fin) = $result =~ /debut\s+([0-9]+).+?fin\s+([0-9]+)/g) and $i < 2)
        {
            $length[$i] = $fin - $debut; # Calculation of the lenght of the first segment, then the lenght of the second segment
            #push(@elements[$i], $length[$i]); #Push the 2 computed lenghts into a table to compute the mean of lenght for the 2 segments
            $elements[$i] -> push($length[$i]);
            $i++;
        }
    }
        close FILE;
        closedir DIR;
}

foreach $value (@elements1)
{
    $sum1 += $_;
}

foreach $value (@elements2)
{
    $sum2 += $_;
}

$meanOfLenghts1 = $sum1/2;
$meanOfLenghts2 = $sum2/2;

printf ("%d %d", $meanOfLenghts1, $meanOfLenghts2);

2009-05-12 14:34:18

I see you're using CGI::Carp; am I correct to infer from this that you're trying to run this in a browser as a CGI app? If so, check your web server's error log - I suspect it's failing because you're just printing the data without first sending HTTP headers. Adding 'print "Content-Type: text/plain\n\n";' before you print anything else should take care of that.

Dave Sherohman 2009-05-13 10:06:06

Answer 5

+1 A:

Anyone here?

2009-05-12 20:28:05

ansaurus

tags:

views:

answers:

[Perl]: Read directory and files, and regex

related questions