I'm trying to read some large text files (between 50M-200M), doing simple text replacement (Essentially the xml I have hasn't been properly escaped in a few, regular cases). Here's a simplified version of the function:
<?php
function cleanFile($file1, $file2) {
$input_file = fopen($file1, "r");
$output_file = fopen($file2, "w");
while (!feof($input_file)) {
$buffer = trim(fgets($input_file, 4096));
if (substr($buffer,0, 6) == '<text>' AND substr($buffer,0, 15) != '<text><![CDATA[')
{
$buffer = str_replace('<text>', '<text><![CDATA[', $buffer);
$buffer = str_replace('</text>', ']]></text>', $buffer);
}
fputs($output_file, $buffer . "\n");
}
fclose($input_file);
fclose($output_file);
}
?>
What I don't get is that for the largest of files, around 150mb, PHP memory usage goes off the chart (around 2GB) before failing. I thought that this was the most memory efficient way to go about reading large files. Is there some method I am missing that would be more efficient for memory? Perhaps some setting that's keeping things in memory when it should be being collected?
In other words, it's not working and I don't know why, and as far as I know I am not doing things incorrectly. Any direction for me to go? Thanks for any input.