tags:

views:

92

answers:

4

I'm trying to split a string using the split function but there isn't always a value between tokens.

Ex: ABC,123,,,,,,XYZ

I don't want to skip the multiple tokens though. These values are in specific positions in the string. However, when I do a split, and then try to step through my resulting array, I get "Use of uninitialized value" warnings.

I've tried comparing the value using $splitvalues[x] eq "" and I've tried using defined($splitvalues[x]) , but I can't for the life of me figure out how to identify what the split function is putting in to my array when there is no value between tokens.

Here's the snippet of my code (now with more crunchy goodness):

my @matrixDetail = ();

#some other processing happens here that is based on matching data from the 
#@oldDetail array with the first field of the @matrixLine array. If it does
#match, then I do the split
if($IHaveAMatch)
{
    @matrixDetail = split(',', $matrixLine[1]);
}
else
{
    @matrixDetail = ('','','','','','','');
}

my $newDetailString =
  (($matrixDetail[0] eq '') ? $oldDetail[0] : $matrixDetail[0])
. (($matrixDetail[1] eq '') ? $oldDetail[1] : $matrixDetail[1]) 
    .
    .
    .
. (($matrixDetail[6] eq '') ? $oldDetail[6] : $matrixDetail[6]);

because this is just snippets, I've left some of the other logic out, but the if statement is inside a sub that technically returns the @matrixDetail array back. If I don't find a match in my matrix and set the array equal to the array of empty strings manually, then I get no warnings. It's only when the split populates the @matrixDetail.

Also, I should mention, I've been writing code for nearly 15 years, but only very recently have I needed to work with Perl. The logic in my script is sound (or at least, it works), I'm just being anal about cleaning up my warnings and trying to figure out this little nuance.

+3  A: 
#!perl

use warnings;
use strict;
use Data::Dumper;

my $str = "ABC,123,,,,,,XYZ";
my @elems = split ',', $str;
print Dumper \@elems;

This gives:

$VAR1 = [
          'ABC',
          '123',
          '',
          '',
          '',
          '',
          '',
          'XYZ'
        ];

It puts in an empty string.

Edit: Note that the documentation for split() states that "by default, empty leading fields are preserved, and empty trailing ones are deleted." Thus, if your string is ABC,123,,,,,,XYZ,,,, then your returned list will be the same as the above example, but if your string is ,,,,ABC,123, then you will have a list with three empty strings in elements 0, 1, and 2 (in addition to 'ABC' and '123').

Edit 2: Try dumping out the @matrixDetail and @oldDetail arrays. It's likely that one of those isn't the length that you think it is. You might also consider checking the number of elements in those two lists before trying to use them to make sure you have as many elements as you're expecting.

CanSpice
This is essentially what I'm doing, here's a snippet of my code:
MitchelWB
Put the code in your question.
CanSpice
apparently, I can't use the enter key? I'll try responding a different way. This is my first time using Stack Overflow
MitchelWB
You can edit your question to add more information. That's the best approach in this case. (and yeah, you can't add carriage returns for comments)
CanSpice
If you add a size to `split` or -1 for all of the fields you will get trailing empty tokens as well. ie, `my @elems = split ',', $str, -1;` if str might be `my $str = ",,ABC,123,,,,,,XYZ,,,";` Otherwise the empty fields after 'XYZ` will not be included in the split.
drewk
A: 

delims with nothing between them give empty strings when split. Empty strings evaluate as false in boolean context.

If you know that your "details" input will never contain "0" (or other scalar that evaluates to false), this should work:

my @matrixDetail = split(',', $matrixLine[1]);
die if @matrixDetail > @oldDetail;

my $newDetailString = "";
for my $i (0..$#oldDetail) {
    $newDetailString .= $matrixDetail[$i] || $oldDetail[$i]; # thanks canSpice
}
say $newDetailString;

(there are probably other scalars besides empty string and zero that evaluate to false but I couldn't name them off the top of my head.)

TMTOWTDI:

$matrixDetail[$_] ||= $oldDetail[$_] for 0..$#oldDetail;
my $newDetailString = join("", @matrixDetail);

edit: for loops now go from 0 to $#oldDetail instead of $#matrixDetail since trailing ",,," are not returned by split.

edit2: if you can't be sure that real input won't evaluate as false, you could always just test the length of your split elements. This is safer, definitely, though perhaps less elegant ^_^

flies
See my edit in a few minutes. I'll add more to my snippet.
MitchelWB
This would be a perfect time to mention the defined-or operator, because then you don't need to worry about empty strings or false values, when all you're checking for is defined-ness.
CanSpice
@CanSpice the empty strings returned by split are defined.
flies
Er, yes. Never mind. Carry on. :-)
CanSpice
The empty strings returned by the split are defined, and that's why when I tried a loop using defined($matrixDetail[$i]) to pick them out didn't work. But neither does evaluating them to equate to ''. I really just need to know how to identify the little buggers.
MitchelWB
as I indicated in my second edit, you can use `length()`.
flies
A: 

Empty fields in the middle will be ''. Empty fields on the end will be omitted, unless you specify a third parameter to split large enough (or -1 for all).

ysth
A: 

Hello,

I suggest to use Text::CSV from CPAN. It is a ready made solution which already covers all the weird edge cases of parsing CSV formatted files.

jira