views:

221

answers:

2

Apache version 2.2.11 (Unix) Architecture x86_64 Operating system Linux Kernel version 2.6.18-164.el5

Ok, here is what I have working. However, I may not be using File::Util for anything else in the rest of the script.

My directory names are 8 digits starting at 10000000 . I was comparing the highest found number with stat last created as a double check but, overkill I believe.

Another issue is that I did not know how to slap a regex in the list_dir command so only 8 digits eg m!^([0-9]{8})\z!x) could reside in that string. Reading the man, the example reads ....'--pattern=\.txt$') but, my futile attempt: '--pattern=m!^([0-9]{8})\z!x)') well, was just that.

So, would there be a "better" way to grab the latest folder/directory?

use File::Util;
my($f) = File::Util->new();
my(@dirs) = $f->list_dir('/home/accountname/public_html/topdir','--no-fsdots');
my @last = (sort { $b <=> $a } @dirs); 
my $new = ($last[0]+1);
print "Content-type: text/html\n\n";
print "I will now create dir $new\n";

And.. How would I ignore anything not matching my regex?

I was thinking an answer may reside in ls -d as well but, as a beginner here, I am new to system calls from a script (and if in fact that's what that would be? ;-) ).

So, more specifically: Best way to open a directory, return the name of the latest 8 digit directory in that directory ignoring all else. Increase the 8 digit dir name by 1 and create the new directory. Whichever is most efficient: stat or actual 8 digit file name. (directory names are going to be 8 digits either way.) Better to use File::Util or just built in Perl calls?

Thanks to everyone in advance. I have learned so much here.

+1  A: 

Best way to open a directory, return the name of the latest 8 digit directory in that directory ignoring all else. Increase the 8 digit dir name by 1 and create the new directory. Whichever is most efficient: stat or actual 8 digit file name?

First, I should point out that having about 100,000,000 subdirectories in a directory is likely to be very inefficient.

  1. How do you get only the directory names that consist of eight digits?

    use File::Slurp;
    my @dirs = grep { -d and /\A[0-9]{8}\z/ } read_dir $top;
    
  2. How do you get the largest?

    use List::Util qw( max );
    my $latest = max @dirs;
    

Now, the problem is, between the determination of $latest and the attempt to create the directory, some other process can create the same directory. So, I would use $latest as the starting point and keep trying to create the next directory until I succeed or run out of numbers.

#/usr/bin/perl

use strict;
use warnings;

use File::Slurp;
use File::Spec::Functions qw( catfile );
use List::Util qw( max );

sub make_numbered_dir {
    my $max = 100_000_000;
    my $top = '/home/accountname/public_html/topdir';
    my $latest = max grep { /\A[0-9]{8}\z/ } read_dir $top;

    while ( ++$latest < $max ) {
        mkdir catfile($top, sprintf '%8.8d', $latest)
            and return 1;
    }
    return;
}

If you try to do it the way I originally recommended, you will invoke mkdir way too many times.

As for how you use File::Util::list_dir to filter entries:

#/usr/bin/perl

use strict;
use warnings;

use File::Util;

my $fu = File::Util->new;

print "$_\n" for $fu->list_dir('.',
    '--no-fsdots',
    '--pattern=\A[0-9]{8}\z'
);
C:\Temp> ks
10001010
12345678

However, I must point out that I did not much like this module in the few minutes I spent with it, especially the module author's obsession with invoking methods and functions in list context. I do not think I will be using it again.

Sinan Ünür
@Sinan; Thank you. I will play around with those. I really wish I could figure this stuff out on my own. You guys are great here. In your first example, at my first glance, is that in fact incrementing the highest number to highest +1 and creating that directory? I am having trouble understanding the mechanics "verbosesly" . ;-)
Jim_Bo
@brian d foy The answer I gave was crap even without the `do {} while ()` silliness. ;-)
Sinan Ünür
I think there is something missing in the File::Slurp example. Grep needs some input.
brian d foy
@Jim_Bo: Thank you for accepting my answer. Remember, it takes time to become comfortable with a programming language. Keep working on it.
Sinan Ünür
I guess I was too quick to accept the answer. Before, I was testing them then, accepting. I was continually "prompted" to accept answers, before my testing was done, which was unknown to the one prompting. However, answers always worked on some level so, I took that for granted I guess and approved this answer before a good test. I must be learning though because, there was a "flag" for me on the original answer. I am never to proud to ask for a more verbose explanation. Even if the code works, I will not use it if I don't fully understand it. Thanks again for catering to this 47 y.o. student.
Jim_Bo
@Jim_Bo Don't be pressured into accepting an answer immediately. Try it out first, ask further probing questions and then, if you get an answer that solves your question, vote it up and accept the answer to show your appreciation. Now, there is no reason anyone should be pressuring you to accept an answer within a few minutes of you posting your question.
Sinan Ünür
@Sinan; You are quite welcome. Your answer v1.01.01 is really nice. Even though there were "problems" with your initial answer, it was still useful in the fact that I learned from it. Hey, how often can you learn from someone elses mistakes? The latest answer here, actually seems to need no further explanation. If I do run into a part I don't understand, I will not hesitate to ask, as usual..
Jim_Bo
@Jim_Bo: That's what mistakes are for.
Sinan Ünür
I think I'm looking at the latest version, and the first code example looks incomplete.
brian d foy
@brian d foy: **Arrrrrggggggghhhhhhhhhhhhhhh!!!!!*** Fixed.
Sinan Ünür
@Sinan I just envisioned you with a parrot on your shoulder.
Jim_Bo
+3  A: 

What are you doing? It sounds really weird and fraught with danger. I certainly wouldn't want to let a CGI script create new directories. There might be a better solution for what you are trying to achieve.

How many directories do you expect to have? The more entries you have in any directory, the slower things are going to get. You should work out a scheme where you can hash things into a directory structure that spreads out the files so no directory holds that many items. Say, it you have the name '0123456789', you create the directory structure like:

 0/01/0123456789

You can have as many directory levels as you like. See the directory structure of CPAN, for instance. My author name is BDFOY, so my author directory is authors/id/B/BD/BDFOY. That way there isn't any directory that has a large number of entries (unless your author id is ADAMK or RJBS).

You also have a potential contention issue to work out. Between the time you discover the latest and the time you try to make the next one, you might already create the directory.

As for the task at hand, I think I'd punt to system for this one if you are going to have a million directories. With something like:

ls -t -d -1 [0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9] | head -1

I don't think you'll be able to get any faster than ls for this task. If there are a large number of directories, the cost of the fork should be outweighed by the work you have to do to go through everything yourself.

I suspect, however, that what you really need is some sort of database.

brian d foy
Yes, this was a big concern of mine. With flock ex (or a plethora of modules) I can circumvent the issue while working with a file but, while reading directories, I am stuck. I read that ls was the way to go but, I was unsure how to implement. I assumed system(perl ls -d ..past my ablility); but, I have not worked with direct system commands yet. I will try to figure out what you posted in your answer and how I would actually implement it in a script. Thanks @brian...Could be worse though, eg php. ;-)
Jim_Bo
I'm not fond of cheap shots at PHP. I've seen it used well, just like any other language.
brian d foy
@brian, not meant to be a PHP shot, I understand php even less than Perl, so, a shot at my ability was the reference, as well as one in the foot now obviously..
Jim_Bo
@brian, I actually was working on a snippet that creates a new top directory when the original top dir contains 500 directories. I back burnered that because 500 was a guess and that is not acceptable. Still did not solve the simultaneous issue either, just helped avoid a bit. I was in the process of learning how many I should allow and why. I also assumed I should do some "time trials" on my server reading said directory and contents to make the determination but, my main issue was with what motivated the top question here.
Jim_Bo
I am having difficulty finding documentation on ls to figure this answer out. A breakdown of the above, or a link in the right direction would be quite helpful if not essential in my case.
Jim_Bo
Look at the man page for ls, then the man page for head.
brian d foy
Thank you @brian. -t = sort by last modified : -d = directories only : -1 = one per line : head = outputs the fist part of files : -1 tells head to get the first one. Very cool! So:my $last = `ls -t -d -1 [0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9] | head -1`; RIGHT?? If so then: my $last = `cd $topdir; ls -t -d -1 [0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9] | head -1'; (This reminds me of my DOS days.) Am I close???
Jim_Bo
@brian, I was able to barter to obtain all the books listed in your profile. (No $income due to my "limitations".) Via setting by up yet another wordpress site on my server for a local book nook. I will be receiving them next week! Hopefully, the books will reduce some of the time I need to spend here asking silly questions.
Jim_Bo
@brian; Your example above is most awesome. I dove head first into finding/learning these system calls now. So much better than the way I was doing it. I am having trouble finding a list of these though. I can find individual docs but, not a comprehensive list of all of them (or the most common) with switches listed and/or examples. Maybe I will find that info in one of the books. Any links will help for now if you know of any.
Jim_Bo