Build arrays with hashes as indices:
Read in file a.csv
line by line and store in a_hash[md5($line)] = array($offset, $length)
Read in file b.csv
line by line and store in b_hash[md5($line)] = true
By using the hashes as indices you will automagically not wind up having duplicate entries.
Then for every hash that has an index in both a_hash and b_hash read in the contents of the file (using offset and length you stored in a_hash) to pull out the actual line text. If you're paranoid about hash collisions then store offset/length for b_hash as well and verify with stristr.
This will run a lot faster and use up far, far, FAR less memory.
If you want to reduce memory requirement further and don't mind checking duplicates then:
Read in file a.csv
line by line and store in a_hash[md5($line)] = false
Read in file b.csv
line by line, hash the line and check if exists in a_hash.
If a_hash[md5($line)] == false
write to c.csv
and set a_hash[md5($line)] = true
Some example code for the second suggestion:
$a_file = fopen('a.csv','r');
$b_file = fopen('b.csv','r');
$c_file = fopen('c.csv','w+');
if(!$a_file || !$b_file || !$c_file) {
echo "Broken!<br>";
exit;
}
$a_hash = array();
while(!feof($a_file)) {
$a_hash[md5(fgets($a_file))] = false;
}
fclose($a_file);
while(!feof($b_file)) {
$line = fgets($b_file);
$hash = md5($line);
if(isset($a_hash[$hash]) && !$a_hash[$hash]) {
echo 'record found: ' . $line . '<br>';
fwrite($c_file, $line);
$a_hash[$hash] = true;
}
}
fclose($b_file);
fclose($c_file);