tags:

views:

65

answers:

2

How do I convert the processing instruction elements into normal XML element using Perl?

for example:

<?legalnoticestart?>
<?sourcenotestart?>
<para>Content para</para>
<?sourcenoteend?>
<?literallayoutstart?>
<?literallayoutend?>
<?literallayoutend?>
<?legalnoticeend?>

Required format:

<legalnotice>
<sourcenote>
<p>Content para</p>
</sourcenote>
<literallayout>
<p>body content</p>
</literallayout>
</legalnotice>

Please give me any solutions using Perl scripts?

Code will be appreciated.

Best Regards, Antony

+3  A: 

Oddly enough I would use XML::Twig for that:

#!/usr/bin/perl

use strict;
use warnings;

use XML::Twig;

XML::Twig->new( twig_roots => { '#PI' => \&out_pi, },
                twig_print_outside_roots => 1,
              )
         ->parsefile( 'pi2elt.xml')
         ;

sub out_pi
  { my( $t, $pi)= @_;
    my $target= $pi->target;
    $target=~ s{^(.*)start$}{$1};
    $target=~ s{^(.*)end$}{/$1};
    print "<$target>";
  }

This will go through the file, only processing PIs ( the twig_roots option)and outputting the rest unchanged (the twig_print_outside_roots option).

A few caveats: your input file needs to be valid XML, so it must be in UTF-8 or UTF-16, or have an XML declaration that specifies its encoding. There is also no check at all that the output is valid XML, you can check the output with any proper XML parser.

mirod
A: 

Here's my solution (regex based):

my $string = <<TEXT;
<?legalnoticestart?>
<?sourcenotestart?>
<para>Content para</para>
<?sourcenoteend?>
<?literallayoutstart?>
<?literallayoutend?>
<?literallayoutend?>
<?legalnoticeend?>';
TEXT

$string =~ s!<\?([^\?]+)start\?>!<\1>!g;
$string =~ s!<\?([^\?]+)end\?>!</\1>!g;
print $string;
spudly