views:

806

answers:

3

I have a Perl app. that parses MediaWiki SQL tables and displays data from multiple wiki pages. I need to be able to re-create the absolute image path to display the images, eg: [.../f/fc/Herbs.jpg/300px-Herbs.jpg]

From MediaWiki Manual:Image_Authorisation: "the [image] path can be calculated easily from the file name and..." How is the path calculated?

+1  A: 

One possible way would be to calculate the MD5 signature of the file (or the file ID in a database), and then build/find the path based on that.

For example, say we get an MD5 signature like "1ff8a7b5dc7a7d1f0ed65aaa29c04b1e"

The path might look like "/1f/f" or "/1f/ff/8a"

The reason is that you don't want to have all the files in 1 folder, and you want to have the ability to "partition" them across different servers, or a SAN or whatever in an equally-spread-out way.

The MD5 signature is a string of 16 "hex" characters. So our example of "/1f/ff/8a" gives us 256*256*256 folders to store the files in. That ought to be enough for anybody :)


Update, due to popular demand:

NOTE - I just realized we are talking specifically about how MediaWiki does it. This is not now MediaWiki does it, but another way in which it could have been done.

By "MD5 signature" I mean doing something like this (code examples in Perl):

use Digest::MD5 'md5_hex';
my $sig = md5_hex( $file->id );

$sig is now 32 alpha-numeric characters long: "1ff8a7b5dc7a7d1f0ed65aaa29c04b1e"

Then build a folder structure like this:

my $path = '/usr/local/media';
map { mkdir($path, 0666); $path .= "/$_" } $sig =~ m/^(..)(..)(..)/;
open my $ofh, '>', "$path/$sig"
  or die "Cannot open '$path/$sig' for writing: $!";
print $ofh "File contents";
close($ofh);

Folder structure looks like

/
  usr/
    local/
      media/
        1f/
          f8/
            a7/
              1ff8a7b5dc7a7d1f0ed65aaa29c04b1e
JDrago
This answer is incorrect, per Nohat, below.
Rob
Thanks for pointing out the lack of clarity. Fixed now.
JDrago
+7  A: 

The accepted answer is incorrect:

  • The MD5 sum of a string is 32 hex characters (128 bits), not 16
  • The file path is calculated from the MD5 sum of the filename, not the contents of the file itself
  • The first directory in the path is the first character, and the second directory is the first and second characters. The directory path is not a combination of the first 3 or 6 characters.

The MD5 sum of 'Herbs.jpg' is fceaa5e7250d5036ad8cede5ce7d32d6. The first 2 characters are 'fc', giving the file path f/fc/, which is what is given in the example.

nohat
I had been racking my brains for the actual server path for 3 days now! Thanks man :)
Crimson
+2  A: 

In PHP you can call the following function to get the URL. You may want to look at the php code to figure out how they calculate the path.

$url = wfFindFile(Title::makeTitle(NS_IMAGE, $fileName))->getURL();
gradbot
Thanks a lot :) that was immensely helpful!
Crimson