I have a website that will have millions of pages in a directory. I'd like to store those files on-disk in a bunch of subdirectories based on the first characters of the page name.
For example http://mysite.com/hugedir/somefile.html
would be stored in /var/www/html/hugedir/s/o/m/e/f/ile.html
That is fairly trivial to do with a RewriteRule like so:
RewriteRule ^hugedir/(.)(.)(.)(.)(.)(.*).html /hugedir/{$1}/{$2}/{$3}/{$4}/{$5}/$6.html
RewriteRule ^hugedir/(.)(.)(.)(.)(.*).html /hugedir/{$1}/{$2}/{$3}/{$4}/{$5}.html
RewriteRule ^hugedir/(.)(.)(.)(.*).html /hugedir/{$1}/{$2}/{$3}/{$4}.html
RewriteRule ^hugedir/(.)(.)(.*).html /hugedir/{$1}/{$2}/{$3}.html
RewriteRule ^hugedir/(.)(.*).html /hugedir/{$1}/{$2}.html
RewriteRule ^hugedir/(.*).html /hugedir/{$1}.html
However, the file name may contain hyphens or other non-standard characters and I'd really like to avoid having a directory named with a strange character. Ideally, I'd like to have a list of 'approved' characters and either eliminate or transform the unapproved characters to an underscore.
Can anybody think of a way to do that? Or something else equivalent? Part of the requirement is that these be physical files on disk and it not be parsed with a scripting language.