ansaurus

Question

Can I use Perl's unpack to break up a string into vars?

Answer 1

+7 A:

I'd just use a regex for that:

my ($dir, $name, $file, $ext) = $path =~ m:(.*)/(.*)/(.*)\.(.*):;

Or, to match your specific example:

my ($dir, $name, $file, $ext) = $example =~ m:^(\d{4})(\d{4})(\d{4})\.(.{3})$:;

Chris Jester-Young 2009-10-07 21:57:58

Excellent! Another great answer. Thanks very much. Now, which one to use?

Jim_Bo 2009-10-07 22:11:27

I wonder which approach would actually be faster? I struggled with which answer to check but, I guess since the title is "can I use unpack instead...." the check goes there. I'll give you useful bumps here for sure though! ;-)

Jim_Bo 2009-10-07 22:23:12

It depends. If you have a list of a whole lot of file names in this fixed format (as it seems from your question), use `unpack`. Otherwise, use regex.

Sinan Ünür 2009-10-07 22:23:56

Yeah, this one is a one hit wonder. Only one $example will need to be processed. So, regex is prob the best choice. However, when my friend wants to view all his paintings in the directory, or view his hit data, I will try to utilize unpack. Believe me, I will be back here to see if I did it right!

Jim_Bo 2009-10-07 22:35:45

Answer 2

+11 A:

Absolutely:

my $example = "100020003000.png";
my ($dir, $name, $file, $ext) = unpack 'A4' x 4, $example;

print "$dir\t$name\t$file\t$ext\n";

Output:

1000    2000    3000    .png

Sinan Ünür 2009-10-07 22:00:04

+1 for actually providing a good pack-based answer, even if I like the idea of using regexes more. :-)

Chris Jester-Young 2009-10-07 22:01:28

@Sinan Very nice. That explains to me how unpack works very well. I read docs on it today but, could not find an example that made sense to me. Thanks so much!KUDOS!

Jim_Bo 2009-10-07 22:08:54

Answer 3

A:

Both substr and unpack bias your thinking toward fixed-layout, while regex solutions are more oriented toward flexible layouts with delimiters.

The example you gave appeared to be fixed layout, but directories are usually separated from file names by a delimiter (e.g. slash for POSIX-style file systems, backwardslash for MS-DOS, etc.) So you might actually have a case for both; a regex solution to split directory and file name apart (or even directory/name/extension) and then a fixed-length approach for the name part by itself.

joel.neely 2009-10-07 22:10:23

I must be learning, I actually understand what you posted.

Jim_Bo 2009-10-07 22:17:49

Answer 4

+3 A:

Using unpack is good, but since the elements are all the same width, the regex is very simple as well:

my $example = "100020003000.png";
my ($dir, $name, $file, $ext) = $example =~ /(.{4})/g;

FM 2009-10-07 22:22:29

Man, keeps getting smaller and smaller! Thanks @FM ! That one goes in my Perl bible too. I kept each area (var) in sets of four chars because I knew it would benefit somehow. You just showed me why! Thanks!

Jim_Bo 2009-10-07 22:28:08

Answer 5

+2 A:

It isn't unpack, but since you have groups of 4 characters, you could use a limited split, with a capture:

my ($dir, $name, file, $ext) = grep length, split /(....)/, $filename, 4;

This is pretty obfuscated, so I probably wouldn't use it, but the capture in a split is an ofter overlooked ability.

So, here's an explanation of what this code does:

Step 1. split with capturing parentheses adds the values captured by the pattern to its output stream. The stream contains a mix of fields and delimiters.

qw( a 1 b 2 c 3 ) == split /(\d)/, 'a1b2c3';

Step 2. split with 3 args limits how many times the string is split.

qw( a b2c3 ) == split /\d/, 'a1b2c3', 2;

Step 3. Now, when we use a delimiter pattern that matches pretty much anything /(....)/, we get a bunch of empty (0 length) strings. I've marked delimiters with D characters, and fields with F:

 ( '', 'a', '', '1', '', 'b', '', '2' ) == split /(.)/, 'a1b2';
   F    D   F    D   F    D   F    D

Step 4. So if we limit the number of fields to 3 we get:

 ( '', 'a', '', '1', 'b2' ) == split /(.)/, 'a1b2', 3;
   F    D   F    D   F

Step 5. Putting it all together we can do this (I used a .jpeg extension so that the extension would be longer than 4 characters):

 ( '', 1000, '', 2000, '', 3000, '.jpeg' ) = split /(....)/, '100020003000.jpeg',4;
   F   D     F   D     F   D     F

Step 6. Step 5 is almost perfect, all we need to do is strip out the null strings and we're good:

( 1000, 2000, 3000, '.jpeg' ) = grep length, split /(....)/, '100020003000.jpeg',4;

This code works, and it is interesting. But it's not any more compact that any of the other solutions. I haven't bench-marked, but I'd be very surprised if it wins any speed or memory efficiency prizes.

But the real issue is that it is too tricky to be good for real code. Using split to capture delimiters (and maybe one final field), while throwing out the field data is just too weird. It's also fragile: if one field changes length the code is broken and has to be rewritten.

So, don't actually do this.

At least it provided an opportunity to explore some lesser known features of split.

daotoad 2009-10-08 18:44:46

ansaurus

tags:

views:

answers:

Can I use Perl's unpack to break up a string into vars?

related questions