So my Perl script basically takes a string and then tries to clean it up by doing multiple search and replaces on it, like so:
$text =~ s/<[^>]+>/ /g;
$text =~ s/\s+/ /g;
$text =~ s/[\(\{\[]\d+[\(\{\[]/ /g;
$text =~ s/\s+[<>]+\s+/\. /g;
$text =~ s/\s+/ /g;
$text =~ s/\.*\s*[\*|\#]+\s*([A-Z\"])/\. $1/g; # replace . **** Begin or . #### Begin or ) *The
$text =~ s/\.\s*\([^\)]*\) ([A-Z])/\. $1/g; # . (blah blah) S... => . S...
As you can see, I'm dealing with nasty html and have to beat it into submission.
I'm hoping there is a simpler, aesthetically appealing way to do this. I have about 50 lines that look just like what is above.
I have solved one version of this problem by using a hash where the key is the comment, and the hash is the reg expression, like so:
%rxcheck = (
'time of day'=>'\d+:\d+',
'starts with capital letters then a capital word'=>'^([A-Z]+\s)+[A-Z][a-z]',
'ends with a single capital letter'=>'\b[A-Z]\.'
}
And this is how I use it:
foreach my $key (keys %rxcheck) {
if($snippet =~ /$rxcheck{ $key }/g){ blah blah }
}
The problem comes up when I try my hand at a hash that where the key is the expression and it points to what I want to replace it with... and there is a $1 or $2 in it.
%rxcheck2 = (
'(\w) \"'=>'$1\"'
}
The above is to do this:
$snippet =~ s/(\w) \"/$1\"/g;
But I can't seem to pass the "$1" part into the regex literally (I think that's the right word... it seems the $1 is being interpreted even though I used ' marks.) So this results in:
if($snippet =~ /$key/$rxcheck2{ $key }/g){ }
And that doesn't work.
So 2 questions:
Easy: How do I handle large numbers of regex's in an easily editable way so I can change and add them without just cut and pasting the line before?
Harder: How do I handle them using a hash (or array if I have, say, multiple pieces I want to include, like 1) part to search, 2) replacement 3) comment, 4) global/case insensitive modifiers), if that is in fact the easiest way to do this?
Thanks for your help -