tags:

views:

75

answers:

5

Using PHP I have tried all day to get this done. I failed. I want to:

  1. open a directory and read all files there.
  2. read each files contents line by line (each line is a name with no spaces (single column)).
  3. put each line into a new file (newline by newline).
  4. remove duplicate lines.
  5. save the new file.

Easy for the gurus, mind numbing for me.

NOTE: Each file may be 500 lines long and 20 characters per line but, there is only around 20 files.

Thanks in advance for the help.

Thanks again. Based on the posts below I tried

    $topdir = '/home/mycal25/public_html/processed/';


$files = glob($topdir."*.txt"); //matches all text files

$lines = array();
foreach($files as $file)
{
 $lines = array_merge($lines, file($file, FILE_SKIP_EMPTY_LINES | FILE_IGNORE_NEW_LINES));
}
$lines = array_unique($lines);

file_put_contents($topdir."all/all.txt", implode("\n", $lines));

But that did not work... I tried a couple other variations to no avail..

+3  A: 

Something like:

$lines = array()
foreach ($files as $file) {
    $lines = array_merge($lines, file($file));
}

$lines = array_unique($lines);

$fp = fopen('dest.txt', 'w');
foreach ($lines as $line) {
    fwrite($fp, $line);
}
fclose($fp);

Alternatively you could do this differently where you check for unique entries each time you load the new file. This would save on RAM but potentially use more CPU.

Based on your comment about opendir, you can do something like the following:

$files = glob('/home/mycal25/public_html/processed/*');

or sticking with opendir()

$topdir = '/home/mycal25/public_html/processed';
$dh = opendir($topdir);
while (($file = readdir($dh)) !== false) {
    $lines = array_merge($lines, file($topdir . '/' . $file));
}

I've skipped some vital error checking in places, just to make the code shorter and easier to read. But if you want to be sure, always check the return values from opendir/glob/fopen, etc

bramp
opendir('/home/mycal25/public_html/processed') or exit("Unable to open directory!")) { ????
Jimbo
Would I open the directory this way then create the array?
Jimbo
I've updated the post to show how you can use opendir.
bramp
AWESOME!!!!! I will try after dinner.
Jimbo
A: 

Just to point out, using sort -u on a unix-based system might help you out really easy if the sort order of the new file doesn't matter.

If you're running PHP on a host that's unix-based you can most likely use sort through system().

ba
Something like "cat * | sort -u > output" would work great.
bramp
A: 

This should work for you. Change the glob pattern if needed.

$files = glob("*.txt"); //matches all text files

$lines = array();
foreach($files as $file)
{
 $lines = array_merge($lines, file($file, FILE_SKIP_EMPTY_LINES | FILE_IGNORE_NEW_LINES));
}
$lines = array_unique($lines);

file_put_contents("output.txt", implode("\n", $lines));
Tim Cooper
Ahh yes. They are all txt files as well. Still though, where is the open directory to read all the files?? Sorry guys.. I am learning, I hope..
Jimbo
The `glob` function gets the names of all the text files in the current directory, so there is no need to call `opendir`.
Tim Cooper
A: 

8 hours wasn't for nothing; think like that and you'll definitely hate programming! I see a very good solution to the problem that might have a few bugs, but all the thinking and big strokes are there. You may just need some improvements on your methods of debugging.

Here's what I'd do: instead of inlining function calls write them out as their own statements and save their return values to meaningful variables. Check this out:

$topDir = '/home/mycal25/public_html/processed/';

/* Grab names of all needed text files */
$filePaths = glob($topdir . '*.txt');

$names = array();

foreach($filePaths as $filePath) {
    $fileLines = file($file, FILE_SKIP_EMPTY_LINES | FILE_IGNORE_NEW_LINES);
    $names = array_merge($names, $fileLines);
}

$uniqueNames = array_unique($names);

$nameList = implode("\n", $uniqueNames);

file_put_contents($topDir . 'all/all.txt', $nameList);

That'd be my personal style. What you can do now is var_dump() every variable and run your script. By doing this you will eventually, by the output, find out which variable doesn't contain what you wanted it to contain.

Also, make sure that all error reporting has been enabled. Shameless plugging: http://www.needtodevelop.com/error-reporting-in-php

erisco
Thank you. I tried this one too and still nothing. I will try again tomorrow. The one from Tim above worked but, it had to be in the same directory. I need to do this from a script in a lower directory. Thanks for all the help.
Jimbo
I didn't write the solution for you; I merely gave you a version that is slightly easier to debug.
erisco
A: 
<?php

$lines = array();

foreach($files as $file)
{
    $lines = array_merge($lines, array_fill_keys(file($file, FILE_SKIP_EMPTY_LINES), 1));
}

file_put_contents('file.txt', implode(array_keys($lines)));

?>
joebert