tags:

views:

381

answers:

8

The perlfunc entry for split says:

By default, empty leading fields are preserved

Hinting that there's a way to over-ride that default, but later on all it says is:

Empty leading fields are produced when there are positive-width matches at the beginning of the string

...does this mean that there's no way to skip that first field?

It's not mission-critical, but I'm splitting a root-relative URL, say /foo/bar/, on the slashes and getting

['', 'foo', 'bar']

and wondering if there's a way to not get that blank first item.

A: 

I would imagine you could just drop all "/" at the beginning of your URL.

NickLarsen
I tend to disfavor techniques that affect the original data as a side effect.
brian d foy
A: 

If you are splitting via the separator of '/' then I see that there is nothing wrong with the output. Considering the first character is a deliminator then it has no choice but to have a empty string.

Chad
A: 

One approach you could use is to use a regexp to remove any leading delimeters.

e.g.

$str = "/foo/bar";
$str =~ m!^/*!!;

Then do your split as before.

Jeffrey Bird
+7  A: 

If you want to split up path elements, look at File::Spec or Path::Class, which handle all of the operating system specific stuff:

 use File::Spec;

 my( $root, @path_parts ) = File::Spec->splitdir( $path );

The nice thing about keeping the root is that you can go backward easily and still keep that leading slash (or whatever your opearting system might use):

 my $path = File::Spec->catfile( $root, @path_parts );

This isn't such a big deal with URLs since they all use a unix-like path specification. Still, it's easy to construct the local path in the same way, and remember where the root is (which may be important on Windows, VMS, etc):

 my ($docroot_root, @doc_root ) = File::Spec->splitdir( $ENV{DOCUMENT_ROOT} );
 my $local_path = File::Spec->catfile( $docroot_root, @doc_root, @path_parts );

Otherwise, you're stuck with what split does. It assumes that you care about the position of fields, so it preserves their position (i.e. the thing before the first separator is always position 0 in the list, even if it is empty). For your problem, I tend to write it as a list assignment where I use a variable to soak up the initial empty field, just like I'd do with

 my( $root, @path_parts ) = split m|/|, $path;
brian d foy
Obviously, **`splitdir`** works on Unixy systems and Windows, but will it work as expected on other platforms as well?
Sinan Ünür
It's supposed to work on other platforms as well. That's the point. If you're talking about which subclass it decides to use, you might need to do a little magic there.
brian d foy
A: 

split(' ', $string) will split the string on spaces but not give you leading, trailing or internal empty fields.

I thought this was a more general case, but doing a split on 'x' is equivalent to /x/

George Phillips
+2  A: 

You can use grep to remove any fields that are zero-length.

grep (length, split ('/','/foo/bar'))

I don't think split can do what you want on it's own.

The people who are telling you to use a domain-specific function to do your splitting are correct. Domain specific split equivalents will automatically handle various non-obvious special cases for you.

NXT
This has the odd effect of potentially removing elements from the middle. You probably want to get rid of those anyway, but maybe sometimes you don't.
brian d foy
A: 

like this

(undef,@x)= split /\//,$string;
Wouldn't that get rid of the first legitimate element if there isn't a delimiter right at the front of the string?
Platinum Azure
`(undef,@x)=split m'/', $string;`
Brad Gilbert
+1  A: 

brian d foy mentioned the File::Spec module. I really like this since it takes a intuitive approach and you know exactly what you are getting.

Depending on your scripting/programming style with Perl, you might want to try:

($volume, $directories, $file) = File::Spec->splitpath( $path );

The result is straightforward and if you need the volume for example, it's right at your fingertips!

And it makes your code a lot more readable! Just be careful, different modules have different specs regarding, for example, symbolic links or mounted disks.

Hayato
This might not be great for splitting a generic case URL. If you are working directly with a Linux filepath I would recommend it though.
Hayato
What do you think is different about a URL path and a linux file path?
brian d foy
No you are right, nothing different about it. I was thinking if you were parsing a path not specific to your server. But like you said, URL would all follow the same construct so it wouldn't be a problem. Thanks!
Hayato