tags:

views:

117

answers:

1

I was noticing some curious behavior with Perl's split command, particularly in cases when I would expect the resulting array to contain empty strings '', but it actually doesn't.

For example, if I have a delimiter(s) at the end (or the beginning) of the string , the resulting array does not have an empty string(s) '' as the last (or first) element.

Example:

@s = split(/x/, 'axb')

produces 2 element array ['a','b']

@s = split(/x/, 'axbx')

produces same array

@s = split(/x/, 'axbxxxx')

produces same array

But as soon as I put something at the end, all those empty strings do appear as elements:

@s = split(/x/, 'axbxxxxc')

produces a 6 element array ['a','b','','','','c']

Behavior is similar if the delimiters are at the beginning.

I would expect empty text between, before, or after delimiters to always produce elements in the split. Can anyone explain to me why the split behaves like this in Perl? I just tried the same thing in Python and it worked as expected.

Note: Perl v5.8

+12  A: 

From the documentation:

By default, empty leading fields are preserved, and empty trailing ones are deleted. (If all fields are empty, they are considered to be trailing.)

That explains the behavior you're seeing with trailing fields. This generally makes sense, since people are often very careless about trailing whitespace, for example. However, you can get the trailing blank fields if you want:

split /PATTERN/,EXPR,LIMIT

If LIMIT is negative, it is treated as if an arbitrarily large LIMIT had been specified.

So to get all trailing empty fields:

@s = split(/x/, 'axbxxxxc', -1);

(I'm assuming you made a careless mistake when looking at leading empty fields - they definitely are preserved. Try split(/x/, 'xaxbxxxx'). The result has size 3.)

Jefromi
When quoting from the docs, please include a link to the relevant document. I've added it for you this time.
cjm
Wow I feel quite foolish for not going straight to the docs. And yeah, you're right, I must have made some careless error when testing the leading spaces, just tried it again and found them preserved. Thanks for the note about putting in the limit of -1, that helped me out!
Roman Stolper
@cjm thanks; I promise I usually do!
Jefromi
@Roman Stolper: leading empty fields are discarded only in the case of the special `split ' '` which splits on any whitespace. (`split / /`, on the other hand, follows the normal rules, preserving leading empty fields and spliting only on space characters.)
ysth
Well, not only in that case; the other case is that leading empty fields are discarded when the regex matches with zero width. See http://perlmonks.org/?node_id=322751 for discussion of this.
ysth