This below goes through files in a directory, reads them and saves them in files of 500 lines max to a new directory. This works great for me (thanks Daniel) but, I need a modification. I would like to save to alpha num based files.
First, sort the array alpha numerically (already lowercase) would be the first step I assume.
Grab all of the lines in each $incoming."/.txt" that start with "a" and put them into a folder at $save500."/a" but, a max of 500 lines each. (I guess it would be best to start with the first at the top of the sort so "0" not "a" right?)
All the lines that start with a number, go into $save500."/num".
None of the lines will start with anything but a-z0-9.
This will allow me to search my files for a match more efficiently using this flatfile method. Narrowing it down to one folder.
$nextfile=0;
if (glob("" . $incoming . "/*.txt") != false){
$nextfile = count(glob("" . $save500 . "/*.txt"));
$nextfile++;
}
else{$nextfile = 1;}
/**/
$files = glob($incoming."/*.txt");
$lines = array();
foreach($files as $file){
$lines = array_merge($lines, file($file, FILE_SKIP_EMPTY_LINES | FILE_IGNORE_NEW_LINES));
}
$lines = array_unique($lines);
/*this would put them all in one file*/
/*file_put_contents($dirname."/done/allofthem.txt", implode("\n", $lines));*/
/*this breaks them into files of 500*/
foreach (array_chunk($lines, 500) as $chunk){
file_put_contents($save500 . "/" . $nextfile . ".txt", implode("\n", $chunk));
$nextfile++;
}
Each still need to be in a max of 500 lines.
I will graduate to mysql later on. Only been doing this a couple months now.
As if that is not enough. I even thought of taking the first two characters off. Making directories with subs a/0 thru z/z!
Could be the wrong approach above since no responses.
But I want a word like aardvark saved to the 1.txt the a/a folder (appending). Unless 1.txt has 500 lines then save it to a/a 2.txt.
So xenia would be appended to the x/e folder 1.txt file unless there are 500 lines so create 2.txt and save it there.
I will then be able to search for those words more efficiently without loading a ton into memory or looping through files /lines that won't contain a match.
Thanks everyone!