ansaurus

Question

how to truncate a string using regular expression in perl

Answer 1

+4 A:

$ perl -pe 's/(<value>[^<]{1,6})[^<]*/$1/' shortstring.in
<value>1234@g</value>
<value>1235@g</value>

In the context of the snippet from your question, use

while (<$input_handle>) {
  s!(<value>)(.*?)(</value>)!$1 . substr($2,0,6) . $3!e
    if /(\d+\@google\.com)/;
  print $output_handle $_;
}

or to do it with a single pattern

while (<$input_handle>) {
   s!(<value>)(\d+\@google\.com)(</value>)!$1 . substr($2,0,6) . $3!e;
  print $output_handle $_;
}

Using bangs as the delimiters on the substitution operator prevents Leaning Toothpick Syndrome in </value>.

NOTE: The usual warnings about “parsing” XML with regular expressions apply.

Demo program:

#! /usr/bin/perl

use warnings;
use strict;

my $input_handle = \*DATA;
open my $output_handle, ">&=", \*STDOUT or die "$0: open: $!";

while (<$input_handle>) {
   s!(<value>)(\d+\@google\.com)(</value>)!$1 . substr($2,0,6) . $3!e;
  print $output_handle $_;
}

__DATA__
<value>[email protected]</value>
<value>[email protected]</value>
<value>[email protected]</value>

Output:

$ ./prog.pl 
<value>1234@g</value>
<value>1235@g</value>
<value>12@goo</value>

Greg Bacon 2010-08-03 19:28:48

I think my code is not correct, I only want to truncate the data between <value></value>

2010-08-03 19:30:41

Why do you think it's not correct?

Paul Tomblin 2010-08-03 19:31:45

your does not work. finally I use this: s/(<value>.{1,$truncate_num}).*(<.*)/$1$2/;

2010-08-04 00:07:25

@gbacon thanks for the updated s!!!e sytnax. Someone else had posted then deleted that, but it didn't include the "<value>" tags. Had never used s!!!e before and was curious on how that would have looked if done correctly.

David Blevins 2010-08-04 00:10:42

@David Perl is flexible about the delimiters on the `s///` operator. Using bangs meant I didn't have to escape the slash in `</value>`.

Greg Bacon 2010-08-04 01:47:27

@lilili08 See updated answer for a working program that includes the code I suggested. What's the output you're seeing?

Greg Bacon 2010-08-04 01:59:11

@gbacon The various separator parts I knew, the 'e' flag is the gem I've wanted several times and did not known how to do. Loving stackoverflow.

David Blevins 2010-08-04 02:36:57

Answer 2

+1 A:

Looks like you want to truncate the text inside the tag which could be shorter than 6 characters already, in which case:

s/(<value>[^<]{1,6})[^<]*/$1/

David Blevins 2010-08-03 19:29:49

Answer 3

A:

s/<value>(.{1,6}).*/<value>$1</value>/;

Paul Tomblin 2010-08-03 19:29:53

With the . in (.{1,6}) you could get stuff like '123</v'

David Blevins 2010-08-03 19:49:14

@David, no, because he's already tested to make sure the tag has `@google.com`, so it can't be smaller than that. If you want to more careful, you could test for the closing tag, but since parsing xml or html in a regex is a REALLY REALLY BAD IDEA anyway, I don't want to give him any ideas.

Paul Tomblin 2010-08-03 20:37:30

Answer 4

+8 A:

Use this instead (regex is not the only feature of Perl and it's overkill for this: :-)

$str = substr($str, 0, 6);

http://perldoc.perl.org/functions/substr.html

bowenl2 2010-08-03 19:30:42

Answer 5

+1 A:

Try this:

s|(?<=<value>)(.*?)(?=</value>)|substr $1,0,6|e;

eugene y 2010-08-03 19:31:20

ansaurus

tags:

views:

answers:

how to truncate a string using regular expression in perl

related questions