Duplicate data removal using Perl called within via a batch file within Windows A DOS window in Windows called via a batch file. A batch file calls the Perl script which carries out the actions. I have the batch file. The code script I have works duplicate data is removal so long as the data file is not too big. The problem that requires resolving is with data files which are larger, (2 GB or more), with this size of file a memory error occurs when trying to load the complete file in to an array for duplicate data removal. The memory error occurs in the subroutine at:-
@contents_of_the_file = <INFILE>;
(A completely different method is acceptable so long as it solves this issue, please suggest). The subroutine is:-
sub remove_duplicate_data_and_file
{
open(INFILE,"<" . $output_working_directory . $output_working_filename) or dienice ("Can't open $output_working_filename : INFILE :$!");
if ($test ne "YES")
{
flock(INFILE,1);
}
@contents_of_the_file = <INFILE>;
if ($test ne "YES")
{
flock(INFILE,8);
}
close (INFILE);
### TEST print "$#contents_of_the_file\n\n";
@unique_contents_of_the_file= grep(!$unique_contents_of_the_file{$_}++, @contents_of_the_file);
open(OUTFILE,">" . $output_restore_split_filename) or dienice ("Can't open $output_restore_split_filename : OUTFILE :$!");
if ($test ne "YES")
{
flock(OUTFILE,1);
}
for($element_number=0;$element_number<=$#unique_contents_of_the_file;$element_number++)
{
print OUTFILE "$unique_contents_of_the_file[$element_number]\n";
}
if ($test ne "YES")
{
flock(OUTFILE,8);
}
}