tags:

views:

503

answers:

4

I want to merge the multiple XML files into single XML file in Perl.

File 1 :

<r1>
   <searchpath>
     <dir>/usr/bin</dir>
     <dir>/usr/local/bin</dir>
     <dir>/usr/X11/bin</dir>
   </searchpath>
 </r1>

FILE 2 :

<r2>
  <user login="grep" fullname="Gary R Epstein" />
  <user login="stty" fullname="Simon T Tyson" />
</r2>

Merged file

<XML>
      <r1>
       <searchpath>
         <dir>/usr/bin</dir>
         <dir>/usr/local/bin</dir>
         <dir>/usr/X11/bin</dir>
       </searchpath>
     </r1>
     <r2>
          <user login="grep" fullname="Gary R Epstein" />
          <user login="stty" fullname="Simon T Tyson" />
        </r2>
</XML>
+1  A: 

First sort the files in order, then open all the files and read the first record of each. Then scan over the records of each file to find the first one. Then read the next record of that file. Repeat until finished.

dlowe
is there any way like hjsplit software did.. ?
joe
+1  A: 

Edits for the new information from the asker.

If you just want to process the contents of all those files, this should work:

@ARGV = qw<F1 f2 f3 f4>;
print "<XML>\n";
while ( my $line = <> ) { 
   print "    $line";
}
print "</XML>\n";

Of course, you could just cat the files together if you cared as little about indentation as XML does--and bookend it with "\n" ... "\n".


Name of the current file will be in $ARGV if you need it. Number of current record is in $. ( or via English: $NR or $INPUT_LINE_NUMBER )

Merge

If you want to merge files, they need to be sorted ( File::Sort ). And then you need to have a dedicated buffer to all the files you want to merge and scan for the lowest record based on the sorting scheme. If you choose that buffer, refresh it from that file, and process the buffer.

Those steps are:

  1. Pick first in concatenation order
  2. Refresh from respective file, flag if EOF
  3. Process record

I would create a Buffer as well as BufferSet class to encapsulate this functionality. The Buffer knows how to offer up the current record when asked, and to refresh from its IO source, when chosen. The BufferSet knows to look for next record from its list of Buffer objects and to handle the Buffer objects. The BufferSet object should definitely know the sorting order, and it might also handle the job of making sure that any buffer has been sorted.

You can use Class::Delegator to make the BufferSet behave like a straight IO object, if you wanted to do that.

Axeman
I have changed the question .. Thanks for information
joe
+4  A: 
#!/usr/bin/perl

use strict;
use warnings;
use XML::LibXML;

my $parser = XML::LibXML->new();
my $xml1 = $parser->parse_string( <<'XML' );
<r1>
   <searchpath>
     <dir>/usr/bin</dir>
     <dir>/usr/local/bin</dir>
     <dir>/usr/X11/bin</dir>
   </searchpath>
 </r1>
XML

my $xml2 = $parser->parse_string( <<'XML' );
<r2>
  <user login="grep" fullname="Gary R Epstein" />
  <user login="stty" fullname="Simon T Tyson" />
</r2>
XML

my $new_xml = XML::LibXML::Element->new( 'XML' );
$new_xml->appendWellBalancedChunk( $xml1->documentElement()->toString() );
$new_xml->appendWellBalancedChunk( $xml2->documentElement()->toString() );
print $new_xml->toString(1);

You can also use $parser->parse_file($filename) if your data is in files instead of strings (see perldoc XML::LibXML::Parser).

The 1 in $new_xml->toString(1) is to properly indent the output. See perldoc XML::LibXML::Node for information about that one.

Fork it here: http://github.com/robinsmidsrod/xml-merge

Robin Smidsrød
+2  A: 
#!/usr/bin/perl
print  '<xml>';
print while <>;
print '</xml>';
dsm