views:

504

answers:

5

I have an image file name that consists of four parts:

  1. $Directory (the directory where the image exists)
  2. $Name (for a art site, this is the paintings name reference #)
  3. $File (the images file name minus extension)
  4. $Extension (the images extension)
$example 100020003000.png

Which I desire to be broken down accordingly:

$dir=1000 $name=2000 $file=3000 $ext=.png

I was wondering if substr was the best option in breaking up the incoming $example so I can do stuff with the 4 variables like validation/error checking, grabbing the verbose name from its $Name assignment or whatever. I found this post:

is unpack faster than substr? So, in my beginners "stone tool" approach:

my $example = "100020003000.png";
my $dir = substr($example, 0,4);
my $name = substr($example, 5,4);
my $file = substr($example, 9,4);
my $ext = substr($example, 14,3); # will add the the  "." later #

So, can I use unpack, or maybe even another approach that would be more efficient?

I would also like to avoid loading any modules unless doing so would use less resources for some reason. Mods are great tools I luv'em but, I think not necessary here.

I realize I should probably push the vars into an array/hash but, I am really a beginner here and I would need further instruction on how to do that and how to pull them back out.

Thanks to everyone at stackoverflow.com!

+7  A: 

I'd just use a regex for that:

my ($dir, $name, $file, $ext) = $path =~ m:(.*)/(.*)/(.*)\.(.*):;

Or, to match your specific example:

my ($dir, $name, $file, $ext) = $example =~ m:^(\d{4})(\d{4})(\d{4})\.(.{3})$:;
Chris Jester-Young
Excellent! Another great answer. Thanks very much. Now, which one to use?
Jim_Bo
I wonder which approach would actually be faster? I struggled with which answer to check but, I guess since the title is "can I use unpack instead...." the check goes there. I'll give you useful bumps here for sure though! ;-)
Jim_Bo
It depends. If you have a list of a whole lot of file names in this fixed format (as it seems from your question), use `unpack`. Otherwise, use regex.
Sinan Ünür
Yeah, this one is a one hit wonder. Only one $example will need to be processed. So, regex is prob the best choice. However, when my friend wants to view all his paintings in the directory, or view his hit data, I will try to utilize unpack. Believe me, I will be back here to see if I did it right!
Jim_Bo
+11  A: 

Absolutely:

my $example = "100020003000.png";
my ($dir, $name, $file, $ext) = unpack 'A4' x 4, $example;

print "$dir\t$name\t$file\t$ext\n";

Output:

1000    2000    3000    .png
Sinan Ünür
+1 for actually providing a good pack-based answer, even if I like the idea of using regexes more. :-)
Chris Jester-Young
@Sinan Very nice. That explains to me how unpack works very well. I read docs on it today but, could not find an example that made sense to me. Thanks so much!KUDOS!
Jim_Bo
A: 

Both substr and unpack bias your thinking toward fixed-layout, while regex solutions are more oriented toward flexible layouts with delimiters.

The example you gave appeared to be fixed layout, but directories are usually separated from file names by a delimiter (e.g. slash for POSIX-style file systems, backwardslash for MS-DOS, etc.) So you might actually have a case for both; a regex solution to split directory and file name apart (or even directory/name/extension) and then a fixed-length approach for the name part by itself.

joel.neely
I must be learning, I actually understand what you posted.
Jim_Bo
+3  A: 

Using unpack is good, but since the elements are all the same width, the regex is very simple as well:

my $example = "100020003000.png";
my ($dir, $name, $file, $ext) = $example =~ /(.{4})/g;
FM
Man, keeps getting smaller and smaller! Thanks @FM ! That one goes in my Perl bible too. I kept each area (var) in sets of four chars because I knew it would benefit somehow. You just showed me why! Thanks!
Jim_Bo
+2  A: 

It isn't unpack, but since you have groups of 4 characters, you could use a limited split, with a capture:

my ($dir, $name, file, $ext) = grep length, split /(....)/, $filename, 4;

This is pretty obfuscated, so I probably wouldn't use it, but the capture in a split is an ofter overlooked ability.

So, here's an explanation of what this code does:

Step 1. split with capturing parentheses adds the values captured by the pattern to its output stream. The stream contains a mix of fields and delimiters.

qw( a 1 b 2 c 3 ) == split /(\d)/, 'a1b2c3';

Step 2. split with 3 args limits how many times the string is split.

qw( a b2c3 ) == split /\d/, 'a1b2c3', 2;

Step 3. Now, when we use a delimiter pattern that matches pretty much anything /(....)/, we get a bunch of empty (0 length) strings. I've marked delimiters with D characters, and fields with F:

 ( '', 'a', '', '1', '', 'b', '', '2' ) == split /(.)/, 'a1b2';
   F    D   F    D   F    D   F    D

Step 4. So if we limit the number of fields to 3 we get:

 ( '', 'a', '', '1', 'b2' ) == split /(.)/, 'a1b2', 3;
   F    D   F    D   F

Step 5. Putting it all together we can do this (I used a .jpeg extension so that the extension would be longer than 4 characters):

 ( '', 1000, '', 2000, '', 3000, '.jpeg' ) = split /(....)/, '100020003000.jpeg',4;
   F   D     F   D     F   D     F

Step 6. Step 5 is almost perfect, all we need to do is strip out the null strings and we're good:

( 1000, 2000, 3000, '.jpeg' ) = grep length, split /(....)/, '100020003000.jpeg',4;

This code works, and it is interesting. But it's not any more compact that any of the other solutions. I haven't bench-marked, but I'd be very surprised if it wins any speed or memory efficiency prizes.

But the real issue is that it is too tricky to be good for real code. Using split to capture delimiters (and maybe one final field), while throwing out the field data is just too weird. It's also fragile: if one field changes length the code is broken and has to be rewritten.

So, don't actually do this.

At least it provided an opportunity to explore some lesser known features of split.

daotoad