views:

71

answers:

2

I'm trying to write some Perl to convert some HTML-based text over to MediaWiki format and hit the following problem: I want to search and replace within a delimited subsection of some text and wondered if anyone knew of a neat way to do it. My input stream is something like:

Please mail <a href="mailto:[email protected]&amp;Subject=Please help&amp;Body=Please can some one help me out here">support.</a> if you want some help.

and I want to change Please help and Please can some one help me out here to Please%20help and Please%20can%20some%20one%20help%20me%20out%20here respectively, without changing any of the other spaces on the line.

Naturally, I also need to be able to cope with more than one such link on a line so splicing isn't such a good option.

I've taken a good look round Perl tutorial sites (it's not my first language) but didn't come across anything like this as an example. Can anyone advise an elegant way of doing this?

A: 

Why dont you just search for the "Body=" tag until the quotes and replace every space with %20.

I would not even use regular expresions for that since I dont find them useful for anything except mass changes where everything on the line is changes.

A simple loop might be the best solution.

Jazz
If you "dont find them useful for anything except mass changes where everything on the line is changes," then I suggest you should learn about more nuanced uses of them to take advantage of their power.
Andy Lester
+5  A: 

Your task has two parts. Find and replace the mailto URIs - use a HTML parsing module for that. This topic is covered thoroughly on Stack Overflow.

The other part is to canonicalise the URI. The module URI is suitable for this purpose.

use URI::mailto;
my @hrefs = ('mailto:[email protected]&amp;Subject=Please help&amp;Body=Please can some one help me out here');
print URI::mailto->new($_)->as_string for @hrefs;
__END__
mailto:[email protected]&amp;Subject=Please%20help&amp;Body=Please%20can%20some%20one%20help%20me%20out%20here
daxim
@daxim Ah, I thought you might say something like that. I was hoping it might be a nice simple one liner in Perl! What I didn't make clear was that the input is actually very old TWiki with embedded HTML tags. I'm not sure what an HTML parser would make of it but I'm going to give it, and the canonicalisation, a try. Thanks for the advice.
Robin Welch