tags:

views:

40

answers:

3

i need to do a file search with php, and i have the filename search down, with glob, but i still need to search inside files.

i have a prototype, at tann98.vacau.com/file-search, but i need keywords, and suggestions. plus it needs to look inside files to find matches.

does anyone have ideas on how to do this kinda thing?

A: 
if (preg_match('/pattern/', file_get_contents($file))) {
   echo "found pattern in $file\n";
}
Alex Howansky
+1  A: 

A very basic method would be to read each file into PHP and search through them with one of the string searching functions.

//loop through all filenames and for each one:
$contents = file_get_contents($filename) ;
if (strpos($contents, $keyword) !== false) {
    //found a match!
}

However this is very inefficient, since you will have to do that file reading and searching every single time you perform a search.

That's why search engines create indexes of the entire files they know about in advance, and then just look into those indexes for the search keyword. If you want to look into that, you would need a separate script (say indexer.php) that will do something like this:

  • loop through each file, getting its contents
  • break those into words
  • keep a record of unique words found in that file
  • store that record in a database or file on disk

And have it run every now and then to update its index. Its index could for example look like this:

$words = array(
    'mobile' => array('filename1.txt', 'filename2.txt'),
    'answer' => array('filename3.txt', 'filename5.txt', 'filename6.txt'),
    //...

);

Then, when you are searching for a certain keyword, you just need to load the index from your index file or database and see which filenames that word is found in.

if (isset($words[$keyword])) {
    echo "Found in: " . join(', ', $words[$keyword]) ;
}

And there you have a very simplistic way of doing something like this. Further down the road you can store the index into a database, count how many times a word is found in each file to provide more relevant results, etc etc.

Fanis
Good explanation! --- If it's available to you, you can use cron jobs to update the index, for example daily. --- Cron jobs executes a file on the server at a preset time or interval (http://www.tophostingdeals.com/glossary.php).
matsolof
A: 

reading the whole file into a variable in php? seriously, come on! php is a hypertext scripting language! you will run into memory errors and ugly other things... only do this if you know that your files dont exceed a max of some hundret kilobytes each...

if you want performance, here is a solution for oyu:

<?php
$handle = popen('grep regex /path/to/file.txt', 'r');
$output = fread($handle, 2096);
pclose($handle);
?>

utilising the external grep utility. you can give some switch to report the position (offset) to see where the match was found this works like this (this time using exec instead of popen for demonstrational purposes):

<?php
exec('grep "REGEX" /path/to/file.txt -b', $result);
?>

this will probably only work in linux ...

if you really want to do it in php or this doesnt work for you, don't do use file_get_contents or something similar but seek through the file. maby like this:

<?php
$handle = @fopen("/tmp/inputfile.txt", "r");
if ($handle) {
    while (!feof($handle)) {
        $buffer = fgets($handle, 4096);
if (preg_match('/pattern/', $buffer)) {
   echo "found pattern in $buffer\n";
}
    }
    fclose($handle);
}
?>

not that buffer may be cut off at 4095 bytes... fgets reads line by line but a maximum you can specify.

Joe Hopfgartner