ansaurus

Question

Linux: Move 1 million files into prefix-based created Folders

Answer 1

+2 A:

You can generate the new file name using, e.g., sed:

$ echo "test.jpg" | sed -e 's/^\(\(.\)\(.\)\(.\).*\)$/\2\/\3\/\4\/\1/'
t/e/s/test.jpg

So, you can do something like this (assuming all the directories are already created):

for f in *; do
   mv -i "$f" "$(echo "$f" | sed -e 's/^\(\(.\)\(.\)\(.\).*\)$/\2\/\3\/\4\/\1/')"
done

or, if you can't use the bash $( syntax:

for f in *; do
   mv -i "$f" "`echo "$f" | sed -e 's/^\(\(.\)\(.\)\(.\).*\)$/\2\/\3\/\4\/\1/'`"
done

However, considering the number of files, you may just want to use perl as that's a lot of sed and mv processes to spawn:

#!/usr/bin/perl -w
use strict;

# warning: untested
opendir DIR, "." or die "opendir: $!";
my @files = readdir(DIR); # can't change dir while reading: read in advance
closedir DIR;
foreach my $f (@files) {
    (my $new_name = $f) =~ s!^((.)(.)(.).*)$!$2/$3/$4/$1/;
    -e $new_name and die "$new_name already exists";
    rename($f, $new_name);
}

That perl is surely limited to same-filesystem only, though you can use File::Copy::move to get around that.

derobert 2009-08-07 03:12:20

thanks for that solution, it is an interesting approach

Dan 2009-08-07 06:40:37

oh, I notice one thing that testing would have found: There need to be a test "is this a file?" so it doesn't move the directories. Fairly easy to fix (e.g., `-f $f or next;` at the top of the perl foreach loop, similar in the shell loop)

derobert 2009-08-08 05:30:53

Answer 2

+1 A:

I suggest a short python script. Most shell tools will balk at that much input (though xargs may do the trick). Will update with example in a sec.

#!/usr/bin/python
import os, shutil

src_dir = '/src/dir'
dest_dir = '/dest/dir'

for fn in os.listdir(src_dir):
  os.makedirs(dest_dir+'/'+fn[0]+'/'+fn[1]+'/'+fn[2]+'/')
  shutil.copyfile(src_dir+'/'+fn, dest_dir+'/'+fn[0]+'/'+fn[1]+'/'+fn[2]+'/'+fn)

SpliFF 2009-08-07 03:12:36

thanks, looks like a wonderful solution. I need to wait for the files to transfer to my new server before I can try it out (ETA 50 hours lol)

Dan 2009-08-07 06:38:19

Answer 3

+3 A:

for i in *.*; do mkdir -p ${i:0:1}/${i:1:1}/${i:2:1}/; mv $i ${i:0:1}/${i:1:1}/${i:2:1}/; done;

The ${i:0:1}/${i:1:1}/${i:2:1} part could probably be a variable, or shorter or different, but the command above gets the job done. You'll probably face performance issues but if you really want to use it, narrow the *.* to fewer options (a*.*, b*.* or what fits you)

edit: added a $ before i for mv, as noted by Dan

inerte 2009-08-07 03:22:04

FYI, the `${i:0:1}` syntax is a bash-ism, which is probably OK on Linux, but just in case...

derobert 2009-08-07 03:27:17

If there are a few directories in the folder, will this loop include them as well?

Dan 2009-08-07 06:37:28

Needed one correction:for i in *.*; do mkdir -p ${i:0:1}/${i:1:1}/${i:2:1}/; mv $i ${i:0:1}/${i:1:1}/${i:2:1}/; done;

Dan 2009-08-10 22:06:31

Only directories with dots in them!

Chris Huang-Leaver 2009-08-12 10:31:55

Answer 4

+2 A:

You can do it as a bash script:

#!/bin/bash

base=base

mkdir -p $base/shorts

for n in *
do
    if [ ${#n} -lt 3 ]
    then
        mv $n $base/shorts
    else
        dir=$base/${n:0:1}/${n:1:1}/${n:2:1}
        mkdir -p $dir
        mv $n $dir
    fi
done

Needless to say, you might need to worry about spaces and the files with short names.

notnoop 2009-08-07 03:30:17

very nice solution, thank you

Dan 2009-08-07 06:41:45

Answer 5

A:

Any of the proposed solutions which use a wildcard syntax in the shell will likely fail due to the sheer number of files you have. Of the current proposed solutions, the perl one is probably the best.

However, you can easily adapt any of the shell script methods to deal with any number of files thus:

ls -1 | \
while read filename
do
  # insert the loop body of your preference here, operating on "filename"
done

I would still use perl, but if you're limited to only having simple unix tools around, then combining one of the above shell solutions with a loop like I've shown should get you there. It'll be slow, though.

Chris Cleeland 2009-08-07 03:41:40

Wildcard syntax should be fine, its a shell built-in and its not being passed on the command line to a program on purpose (otherwise, surely, the command line would be too long). for i in `seq 1 1000000` works, for example.

derobert 2009-08-07 04:02:37

I just tested: using `for f in *` works just fine with 1,000,000 files. Slow, but it works.

derobert 2009-08-07 04:09:38

thanks for your commentary, it was helpful as I am very new to shell scripting

Dan 2009-08-07 06:42:40

@derobert: thanks for testing that out and confirming that it *does* work. This is apparently a case where lessons learned from The Old Days no longer are necessarily true. Bash apparently improved that aspect. I know for a fact that it failed in various ways under the Bourne shell, but that was back in the late 80s/early 90s when I first made the mistake while writing a script to do some maintenance on NetNews directories.

Chris Cleeland 2009-08-07 13:56:54

ansaurus

tags:

views:

answers:

Linux: Move 1 million files into prefix-based created Folders

related questions