ansaurus

Question

How can I delete characters between < and > in Perl?

Answer 1

+6 A:

You may want to check out a Perl module Text::Balanced, part of the core distribution. I think it'll be of help for you. Generally, one wants to avoid regexes to do that sort of thing IF the subject text is likely to have an inner set of delimiters, it can get very messy.

Danny 2009-04-10 14:24:24

Good advice, but not needed in this case. Will definitely keep in mind though.

rlbond 2009-04-10 20:55:10

Answer 2

+4 A:

In Perl:

#! /usr/bin/perl   
use strict;

my $text = <>;
$text =~ s/<[^>]*>//g;
print $text;

The regex substitutes anything starting with a < through the first > (inclusive) and replaces it with nothing. The g is global (more than once).

EDIT: incorporated comments from Hynek and chaos

CoverosGene 2009-04-10 14:28:46

+1 Nice (complete) example!

Andrew Hare 2009-04-10 14:36:16

It's little bit ineffective. To split it and join again. perl -0777 -pe 's/<[^>]*>//gm'

Hynek -Pichi- Vychodil 2009-04-10 14:38:22

the /m modifier isn't helping. It means 'treat as multiline', i.e. match ^ and $ at newlines, not 'this is multiline'. /s, treat as single line, is actually more what you'd want, but you don't need it because your pattern isn't concerned with whitespace.

chaos 2009-04-10 14:46:48

I would put both angle brackets in the negated character class: s/<[^<>]*>//g. Otherwise, you could match from <here <to here>, which probably isn't what you want.

Alan Moore 2009-04-10 18:16:20

Very useful. Chaos's answer, however, is more adaptable towards multi-character delimiters, I.E. using . and /s rather than [^(delimiter)]+1 for great advice though.

rlbond 2009-04-10 20:56:59

Answer 3

A:

Ineffective one-liner way

perl -0777 -pe 's/<.*?>//gs'

same as program

local $/;
my $text = <>;
s/<.*?>//gs;
print $text;

It depends how big text you want convert here is more effective one-liner consuming line by line

perl -pe 'if ($a) {(s/.*?>// and do {s/<.*?>//g; $a = s/<.*//s;1}) or $_=q{}} else {s/<.*?>//g; $a = s/<.*//s}'

same as program

my $a;
while (<>) {
    if ($a) {
        if (s/.*?>//) {
            s/<.*?>//g;
            $a = s/<.*//s;
        }
        else { $_ = q{} }
    }
    else {
        s/<.*?>//g;
        $a = s/<.*//s;
    }
    print;
}

Hynek -Pichi- Vychodil 2009-04-10 14:40:56

As noted re CoverosGene's answer, /m isn't necessary or helpful.

chaos 2009-04-10 14:48:21

Yes, you are right.

Hynek -Pichi- Vychodil 2009-04-10 15:06:37

Answer 4

+4 A:

local $/;
my $text = <>;
s/<.*?>//gs;
print $text;

chaos 2009-04-10 14:51:00

If your string looks like this: <abc<def>ghi>, your regex leaves 'ghi>'. If nested or escaped brackets and other perverse cases "never happen" the regex is fine. To handle the perverse cases, use Text::Balanced, even though the interface is weird.

daotoad 2009-04-10 16:26:01

Answer 5

A:

You might find How can I remove text within parentheses with a regex? helpful.

daotoad 2009-04-10 16:32:22

ansaurus

tags:

views:

answers:

How can I delete characters between < and > in Perl?

related questions