For recreational reasons I wrote a PHP class that classifies files with tags instead of in a hierarchical way, the tags are stored in the filename itself in the form of +tag1+tag2+tagN+MD5.EXTENSION and thus I'm stucked with the chars limit (255) imposed by the FS/OS. Here is the class:
<?php
class TagFS
{
public $FS = null;
function __construct($FS)
{
if (is_dir($FS) === true)
{
$this->FS = $this->Path($FS);
}
}
function Add($path, $tag)
{
if (is_dir($path) === true)
{
$files = array_slice(scandir($path), 2);
foreach ($files as $file)
{
$this->Add($this->Path($path) . $file, $tag);
}
return true;
}
else if (is_file($path) === true)
{
$file = md5_file($path);
if (is_file($this->FS . $file) === false)
{
if (copy($path, $this->FS . $file) === false)
{
return false;
}
}
return $this->Link($this->FS . $file, $this->FS . '+' . $this->Tag($tag) . '+' . $file . '.' . strtolower(pathinfo($path, PATHINFO_EXTENSION)));
}
return false;
}
function Get($tag)
{
return glob($this->FS . '*+' . str_replace('+', '{+,+*+}', $this->Tag($tag)) . '+*', GLOB_BRACE);
}
function Link($source, $destination)
{
if (is_file($source) === true)
{
if (function_exists('link') === true)
{
return link($source, $destination);
}
if (is_file($destination) === false)
{
exec('fsutil hardlink create "' . $destination . '" "' . $source . '"');
if (is_file($destination) === true)
{
return true;
}
}
}
return false;
}
function Path($path)
{
if (file_exists($path) === true)
{
$path = str_replace('\\', '/', realpath($path));
if ((is_dir($path) === true) && ($path[strlen($path) - 1] != '/'))
{
$path .= '/';
}
return $path;
}
return false;
}
function Tag($string)
{
/*
TODO:
Remove (on Windows): . \ / : * ? " < > |
Remove (on *nix): . /
Remove (on TagFS): + * { }
Remove (on TagFS - Possibly!) -
Max Chars (in Windows) 255
Max Char (in *nix) 255
*/
$result = array_filter(array_unique(explode(' ', $string)));
if (empty($result) === false)
{
if (natcasesort($result) === true)
{
return strtolower(implode('+', $result));
}
}
return false;
}
}
?>
I believe this system works well for a couple of small tags, but my problem is when the size of the whole filename exceeds 255 chars. What approach should I take in order to bypass the filename limit? I'm thinking in splitting tags on several hard links of the same file, but the permutations may kill the system.
Are there any other ways to solve this problem?
EDIT - Some usage examples:
<?php
$images = new TagFS('S:');
$images->Add('P:/xampplite/htdocs/tag/geoaki.png', 'geoaki logo');
$images->Add('P:/xampplite/htdocs/tag/cloud.jpg', 'geoaki cloud tag');
$images->Add('P:/xampplite/htdocs/tag/cloud.jpg', 'nuvem azul branco');
$images->Add('P:/xampplite/htdocs/tag/xml-full.gif', 'geoaki auto vin api service xml');
$images->Add('P:/xampplite/htdocs/tag/dunp3d-1.jpg', 'dunp logo');
$images->Add('P:/xampplite/htdocs/tag/d-proposta-04c.jpg', 'dunp logo');
/*
[0] => S:/+api+auto+geoaki+service+vin+xml+29be189cbc98fcb36a44d77acad13e18.gif
[1] => S:/+azul+branco+nuvem+4151ae7900f33788d0bba5fc6c29bee3.jpg
[2] => S:/+cloud+geoaki+tag+4151ae7900f33788d0bba5fc6c29bee3.jpg
[3] => S:/+dunp+logo+0cedeb6f66cbfc3974c6b7ad86f4fbd3.jpg
[4] => S:/+dunp+logo+8b9fcb119246bb6dcac1906ef964d565.jpg
[5] => S:/+geoaki+logo+5f5174c498ffbfd9ae49975ddfa2f6eb.png
*/
echo '<pre>';
print_r($images->Get('*'));
echo '</pre>';
/*
[0] => S:/+azul+branco+nuvem+4151ae7900f33788d0bba5fc6c29bee3.jpg
*/
echo '<pre>';
print_r($images->Get('azul nuvem'));
echo '</pre>';
/*
[0] => S:/+dunp+logo+0cedeb6f66cbfc3974c6b7ad86f4fbd3.jpg
[1] => S:/+dunp+logo+8b9fcb119246bb6dcac1906ef964d565.jpg
[2] => S:/+geoaki+logo+5f5174c498ffbfd9ae49975ddfa2f6eb.png
*/
echo '<pre>';
print_r($images->Get('logo'));
echo '</pre>';
/*
[0] => S:/+dunp+logo+0cedeb6f66cbfc3974c6b7ad86f4fbd3.jpg
[1] => S:/+dunp+logo+8b9fcb119246bb6dcac1906ef964d565.jpg
*/
echo '<pre>';
print_r($images->Get('logo dunp'));
echo '</pre>';
/*
[0] => S:/+geoaki+logo+5f5174c498ffbfd9ae49975ddfa2f6eb.png
*/
echo '<pre>';
print_r($images->Get('geo* logo'));
echo '</pre>';
?>
EDIT: Due to the several suggestions to use a serverless database or any other type of lookup table (XML, flat, key/value pairs, etc) I want to clarify the following: although this code is written in PHP, the idea is to port it to Python and make a desktop application out of it - this has noting to do (besides the example of course) with PHP. Furthermore, if I have to use some kind of lookup table I'll definitely go with SQLite 3, but what I'm looking for is a solution that doesn't involves any other additional "technology" besides the filesystem (folders, files and hardlinks).
You may call me nuts but I'm trying to accomplish two simple goals here: 1) keep the system "garbage" free (who likes Thumbs.db or DS_STORE for example?) and 2) keep the files easily identifiable if for some reason the lookup table (in this case SQLite) gets busy, corrupt, lost or forgot (in backups for instance).
PS: This is supposed to run on both Linux, Mac, and Windows (under NTFS).