ansaurus

Question

Answer 1

+1 A:

Try this out for size:

my @numbers = ($moyLargRectNom, $moyHautRectNom, $moyLargRectNom, $moyHautRectNom, $moyLargRectMat, $moyHautRectMat, $moyLargRectMat, $moyHautRectMat);
my @temp = split / /, $ligne;
for(@temp) {
  if(/^\W*\d\W*$/) {
    my $num = shift @numbers;
    s/\d+/$num/;
  }
}
$ligne = join " ", @temp;

That makes a list, @temp, based on the "words" (approximately) in $ligne. It makes another list, @numbers, which is a list of the numbers you want to replace in the list, in the order you want them to replace things. Then it goes through @temp, one-by-one, and if a given element is a number (i.e. matches the regex /^\W*\d\W*$/, which means it has no word characters (so it's not "champs4") and has at least one number - this will match "25346)" in addition to "25346"), and then replace the numeric part with the first value from @numbers. And now that I've tested it, I can assure you this actually works!

I believe a shorter implementation could be achieved with map, but this will work well enough for you.

Advantages of this approach to your approach:

First, this solution is scalable. To replace more than eight numbers with your solution, you would need to write a new regex. To replace more than eight numbers with my solution, just add a few more entries to @numbers. This code could be put into a subroutine that takes a string to change and a list of numbers to change, and you wouldn't have to worry about whether or not they passed the right number of numbers, or whether they have the right format.

Second, this is a bit easier to understand at cursory glance. Regexes as long as the one you were using are very hard to parse visually. Even if it works, someday someone may need to alter your code to do something different. If you use a huge regex, the rewriter (perhaps you) will simply shake their heads, highlight your code, and press delete, and then write new code to do it. With this, they can easily see what is happening in your code, and if they need to make modifications to it, they can.

Third, if you want to hardcode in a specified number of replacements to make, you can do that too:

my @numbers = ($moyLargRectNom, $moyHautRectNom, $moyLargRectNom, $moyHautRectNom, $moyLargRectMat, $moyHautRectMat, $moyLargRectMat, $moyHautRectMat);
my @temp = split / /, $ligne;
my $max_replacements = 8;
for(@temp) {
  if(/^\W*\d\W*$/) {
    my $num = shift @numbers;
    s/\d+/$num/;
    last unless --$max_replacements;
  }
}
$ligne = join " ", @temp;

As a side note (which applied earlier, but still applies), this will fail on floating point numbers - /^\W*\d\W*$/ will match floating point numbers, but s/\d+/$num/ won't replace floating point numbers, only the integer part. If you discover you need floating point numbers, change this line:

s/\d+/$num/;

To this:

s/\d+|(?:\d+)?\.\d+/$num/;

That should match floating point numbers.

Chris Lutz 2009-05-20 20:35:16

Thank you to all. Chris Lutz, your solution replace only the first value after debut, and the first value after fin, and so on for the other debut and fin, so only 4 values.

2009-05-20 21:44:02

Indeed, it does. That's what I get for not testing code before I post it. Anyway, it misses the second number because it reads "25345)" instead of "25345", and the ")" doesn't match as a number. This will be quickly rectified.

Chris Lutz 2009-05-21 02:22:40

Answer 2

A:

You seem to be doing it the opposite way that I would. i.e i'd look for the numbers and replace those, rather than what you're donig, i.e., matching the the stuff surrounding the numbers and substituting them into a string.

Will there ALWAYS be 8 values? Will they always follow the same words? if so:

.+?debut\s([\d]+)\s([\d]+).+?fin\s([\d]+)\s([\d]+).+?debut\s([\d]+)\s([\d]+).+?fin\s([\d]+)\s([\d]+)

or can debut & fin appear anywhere, and whenever they do you want to replace them as such:

debut x y -> debut $moyLargRectNom, $moyHautRectNom, fin x y -> fin $moyLargRectNom, $moyHautRectNom, (debut 144825 25345) (fin 244102 40647)

if that's true, just do it using two simple regex:

debut\s([\d]+)\s([\d]+)
fin\s([\d]+)\s([\d]+)

and replace the groups with the words..

but I can't remember which variable stores the number of groups created, sorry.

Pod 2009-05-20 20:35:29

Answer 3

+1 A:

sprintf to the rescue:

#!/usr/bin/perl

use strict;
use warnings;

my $s = <<EO_TXT;
(champs1 (champs6 donnee_o donnee_f) [(champs2 [] 
(champs3 _YOJNJeyyyyyyB (champs4 donnee_x)) (debut 144825 25345) 
(fin 244102 40647)), (champs2 [] (champs3 _FuGNJeyyyyyyB 
(champs4 donnee_z)) (debut 796443 190570) (fin 145247 42663))] [] []).
EO_TXT

my ( 
    $moyLargRectNom, $moyHautRectNom, 
    $moyLargRectMat, $moyHautRectMat, 
) = map { "val$_" } qw( 1 2 3 4 );

my @replacements = (
    $moyLargRectNom, $moyHautRectNom,
    $moyLargRectNom, $moyHautRectNom,
    $moyLargRectMat, $moyHautRectMat,
    $moyLargRectMat, $moyHautRectMat,
);

$s =~ s/\b[0-9]+\b/%s/g; # replace %s with the appropriate specifier
$s = sprintf $s, @replacements;

print $s, "\n";

Sinan Ünür 2009-05-20 21:55:06

Answer 4

A:

I figgured your structure was too irregular or strange to be suited for a regular expression, nested expressions rarely are.

So I went in hunt of a parse-tree. Not finding one that suited, and not understanding any of the formal parse grammars, I wrote my own tokenister/state machine.

It turns your code into a data-tree which you can then extract with simple looping constructs.

Beware, code is only designed to work on your small data-set provided so far, unbalanced brackets will give parser headaches and produce a useless tree.

Skim to the bottom to see how to use this blob

#!/usr/bin/perl 

use strict;
use warnings;
use version;
use Data::Dumper;
our $VERSION = qv('0.1');

my @stack;

my $data = <<'EOF';
(champs1 
     (champs6 donnee_o donnee_f) 
     [(champs2 [] 
          (champs3 _YOJNJeyyyyyyB (champs4 donnee_x)) 
          (debut 144825 25345)
          (fin 244102 40647)
       ), 
      (champs2 [] 
          (champs3 _FuGNJeyyyyyyB (champs4 donnee_z)) 
          (debut 796443 190570) 
          (fin 145247 42663)
     )] 
     [] 
     []
)
EOF

push @stack,
  {
    tokens  => [],
    context => 'void',
  };

my $state;

my $eaten;
my $str = $data;

sub eat
{
    my $n = shift;
    substr( $str, 0, $n, '' );
}

while ( @stack && $str )
{
    $state = $stack[-1];
    my @tokens  = @{ $stack[-1]->{tokens} };
    my $context = $stack[-1]->{context};

    if ( $str =~ m{(^[\s,]+)} )
    {
        eat length($1);
        next;
    }
    if ( $str =~ m{(^\w+)} )
    {
        eat length($1);
        push @{ $stack[-1]->{tokens} }, $1;
        next;
    }
    if (    $str =~ m{^\[}
        and $context eq 'nest'
        || $context  eq 'nestgroup'
        || $context  eq 'array' )
    {
        eat 1;
        print "\e[33m[\e[0m";
        push @stack,
          {
            tokens  => [],
            context => 'array',
          };

        next;
    }

    if ( $str =~ m{^\]} and $context eq 'array' )
    {
        eat 1;
        print "\e[33m]\e[0m";
        pop @stack;
        push @{ $stack[-1]->{tokens} }, \@tokens;
        next;
    }

    if (
        $str =~ m{^\((champs(\d)|debut|fin)\s}
        and (  $context eq 'nest'
            || $context eq 'array'
            || $context eq 'nestgroup'
            || $context eq 'void' )
      )
    {
        eat length($1) + 1;
        $stack[-1]->{nodename} = $1;
        print "\e[32m($1\e[0m";
        push @stack,
          {
            tokens  => [],
            context => 'nestgroup',
          };
        next;
    }
    if ( $str =~ m{^\)} and $context eq 'nestgroup' )
    {
        eat 1;
        print "\e[32m)\e[0m";
        pop @stack;
        my $nodename = $stack[-1]->{nodename};
        push @{ $stack[-1]->{tokens} }, { $nodename, \@tokens };
        next;
    }
    if ( $str =~ m{^\(} )
    {
        eat 1;
        print "\e[31m(\e[0m";
        push @stack,
          {
            tokens  => [],
            context => 'nest',
          };
        next;
    }
    if ( $str =~ m{^\)} and $context eq 'nest' )
    {
        eat 1;
        print "\e[31m)\e[0m";
        pop @stack;
        push @{ $stack[-1]->{tokens} }, \@tokens;
        next;
    }

    print substr( $str, 0, 1 ), "\e[34m$context\e[0m";
    eat 1;
}

$Data::Dumper::Indent = 1;
$Data::Dumper::Terse  = 1;

print "Tree:\n";
print Dumper( $state->{tokens}->[0]->{champs1}->[1] );

print "--------";
for ( @{ $state->{tokens}->[0]->{champs1}->[1] } )
{
    my @data = @{ $_->{champs2} };
    print ">", Dumper( $data[2], $data[3] );
}

Output:

(champs1(champs6)[(champs2[](champs3(champs4))(debut)(fin))(champs2[](champs3(champs4))(debut)(fin))][][])
Tree:
[
  {
    'champs2' => [
      [],
      {
        'champs3' => [
          '_YOJNJeyyyyyyB',
          {
            'champs4' => [
              'donnee_x'
            ]
          }
        ]
      },
      {
        'debut' => [
          '144825',
          '25345'
        ]
      },
      {
        'fin' => [
          '244102',
          '40647'
        ]
      }
    ]
  },
  {
    'champs2' => [
      [],
      {
        'champs3' => [
          '_FuGNJeyyyyyyB',
          {
            'champs4' => [
              'donnee_z'
            ]
          }
        ]
      },
      {
        'debut' => [
          '796443',
          '190570'
        ]
      },
      {
        'fin' => [
          '145247',
          '42663'
        ]
      }
    ]
  }
]
--------
>{
  'debut' => [
    '144825',
    '25345'
  ]
}
{
  'fin' => [
    '244102',
    '40647'
  ]
}
>{
  'debut' => [
    '796443',
    '190570'
  ]
}
{
  'fin' => [
    '145247',
    '42663'
  ]
}

Kent Fredric 2009-05-21 00:44:43

ansaurus

tags:

views:

answers:

Find and replace patterns in a string

related questions