ansaurus

Question

Getting last occurrence of a string using regular expressions

Answer 1

+1 A:

Hi everone! I need to parse an HTML file and i've got something like this:

Then you need an HTML parser. Regular Expressions aren't powerful enough to do it properly.

Once you've parsed the HTML and got the contents of each of your TAGs, you can use something like:

/(.*)KEY/is

to check whether the text contains KEY and if so, to grab the stuff that precedes it.

Anon. 2010-02-07 20:35:17

Answer 2

A:

If you just don't want to use a HTML parser, this is a regexp that works if TEXT_TO_FIND does not contain "<" or ">":

/\s*([^<>]*?)\s*?KEY/ism

Leventix 2010-02-07 20:39:28

Thanks, this solved it!PS: Yes, I should probably use an HTML parser.

2010-02-07 21:08:38

Answer 3

A:

Use each tool in its appropriate context: find text chunks with an HTML parser, and then match against those with regular expressions.

#! /usr/bin/perl

use warnings;
use strict;

use HTML::Parser;

my $p = HTML::Parser->new(
  api_version => 3,
  text_h => [
    sub {
      local($_) = @_;
      print $1, "\n" if /(\S.+?)\s*\bKEY\b/s;
    },
    "dtext"
  ],
);

# for demo only
*ARGV = *DATA;

undef $/;
$p->parse(<>);

__DATA__
<TAG1>
    <TAG1>
        TEXT_TO_FIND
        KEY
        <TAG1>
        </TAG1>
        <TAG1>
        </TAG1>
    </TAG1>
</TAG1>

Output:

$ ./find-text
TEXT_TO_FIND

Greg Bacon 2010-02-07 21:29:49

ansaurus

tags:

views:

answers:

Getting last occurrence of a string using regular expressions

related questions