tags:

views:

38

answers:

3

I posted this question here before but there were no responses. I may have done something wrong so, here it is again with some more details.

The files in the directory are named 1.txt, 2.txt, 3.txt etc.... The snippet below enters that directory, opens all the *,txt files reading them, removes the dupes and creates one file with all the unique contents. (names in this case).

$files = glob($dirname."/*.txt"); //matches all text files
    $lines = array();
    foreach($files as $file)
    {
    $lines = array_merge($lines, file($file, FILE_SKIP_EMPTY_LINES | FILE_IGNORE_NEW_LINES));
    }
    $lines = array_unique($lines);
    file_put_contents($dirname."/allofthem.txt", implode("\n", $lines));
    }

The above works great for me! Thanks to great help here at stackoverflow.

But, I desire to take it one step further.

Instead of one big duplicate free "allofthem.txt" file, how can I modify the above code to create files with a maximum of 5oo lines each from the new data?

They need to go into a new directory eg $dirname."/done/".$i.".txt" I have tried counting in the loop but my efforts are not working and ended up being a mile long.

I also attempted to push 500 into an array, increment to another array and save that way. No luck. I am just not "getting" it.

Again, this beginner needs some expert assistance. Thanks in advance.

+1  A: 

this function will get you somewhere !

function files_identical($fn1, $fn2) {
    if(filetype($fn1) !== filetype($fn2))
        return FALSE;

    if(filesize($fn1) !== filesize($fn2))
        return FALSE;

    if(!$fp1 = fopen($fn1, 'rb'))
        return FALSE;

    if(!$fp2 = fopen($fn2, 'rb')) {
        fclose($fp1);
        return FALSE;
    }

    $same = TRUE;
    while (!feof($fp1) and !feof($fp2))
        if(fread($fp1, 4096) !== fread($fp2, 4096)) {
            $same = FALSE;
            break;
        }

    if(feof($fp1) !== feof($fp2))
        $same = FALSE;

    fclose($fp1);
    fclose($fp2);

    return $same;
}

Src: http://www.php.net/manual/en/function.md5-file.php#94494

RobertPitt
+5  A: 

Once you have your array of lines as per your code, you can break it into chunks of 500 lines using array_chunk, and then write each chunk to its own file:

// ... from your code
$lines = array_unique($lines);

$counter = 1;
foreach (array_chunk($lines, 500) as $chunk)
{
  file_put_contents($dirname . "/done/" . $counter . ".txt", implode("\n", $chunk));
  $counter++;
}
Daniel Vandersluis
+1 More elegant solution than mine!
halfdan
+1 for `array_chunk()`.
BoltClock
WOW! Worked like a charm!!!
Jim_Bo
A: 
$files = glob($dirname."/*.txt"); //matches all text files
$lines = array();
foreach($files as $file)
{
   $lines = array_merge($lines, file($file, FILE_SKIP_EMPTY_LINES | FILE_IGNORE_NEW_LINES));
}
$lines = array_unique($lines);
$lines_per_file = 500;
$files = count($lines)/$lines_per_file;
if(count($lines) % $lines_per_file > 0) $files++;
for($i = 0; $i < $files; $i++) {
    $write = array_slice($lines, $lines_per_file * $i, $lines_per_file);
    file_put_contents($dirname."/done/".$i.".txt", implode("\n", $write));
}
halfdan