tags:

views:

114

answers:

3

Right now I am attempting to synchronize two data files that are listed by date so that i can make comparisons later on. However I can not seem to print out only the lines where the dates match. At this point I have separated out the data for each file into 2 arrays. I need to find only the dates that are in both arrays and print them out. Any suggestions would be much appreciated.

Here is a sample set of the raw data that I am working with, each file is in the same format:

09/11/2009,00:56:00,51.602,47.894,87,88,0,1032
09/12/2009,00:56:00,57.794,55.796,93,54,0,1023.6
09/13/2009,00:56:00,64.292,62.204,93,66,0,1014.4
09/14/2009,00:56:00,61.592,55.4,80,25,0,1009.6
09/15/2009,00:56:00,58.604,53.798,84,31,0,1009.1
09/16/2009,00:56:00,53.6,48.902,84,45,0,1017

I have split the date into an array for each file. My ultimate goal is to only print lines of code where both files have data. So to do this I wanted to compare the 2 arrays with the elements being the dates.

My initial code looked like this:

foreach $bdate(@bdate){
while (<PL>){
    chomp;
    @arr = split (/,/);
    $pday=$arr[1];
    push @pdate, $pday;
    if ($bdate eq $pdate){
        print "$bdate,$pday\n";
    }
}
A: 

Are you opposed to using the external Unix function "comm"?

barrycarter
I think the OP is looking to match on the first column of each line. comm is not much use for that.
RET
+3  A: 

One way (of many) would be to iterate once through each array, building a hash as follows;

for (@array1, @array2) {
    $dates{$_}++;
}

Then you can print the keys that correspond to values of 2 or more;

print $_,"\n" for grep {$dates{$_} > 1} keys %dates;

(untested, written on a machine with no perl)

...and a quick CPAN search turns up List::Compare, with this example;

$lc = List::Compare->new(\@Llist, \@Rlist);

@intersection = $lc->get_intersection;
Ed Guiness
Thanks for the suggestion however the first method failed omit dates that were in the second array an not in the first. And the second method I couldn't use because that module is not installed on the server where my data is. Its a school server so i wouldn't be able to install it. Would happen to have any other suggestions?
Paul
@Paul, This answer and the others are common approaches to finding common elements in lists (intersection). For that reason I suspect there might be something interesting with either your data or how you're interpreting it. Can you edit your question to include sample data and expected results?
Ed Guiness
Its very probable that how i am interpreting it is the problem, I am very new and self teaching myself perl. I have listed above a sample of the raw data that I am working with. In my script I have placed only the dates into an array for both files that I am working with. The first method you suggested worked well for eliminating dates from the first array that were not in the second array, however there were several dates that did not exist in the second array that were printed. I guess my expected results would be to print out a list of only dates common to both arrays.
Paul
After i fixed my data problem, your first suggestion worked extremely well, I had to sort the output so it read in order, but thats a simple task. When i checked the results against a hand done sample it was spot on and even found a couple of mistakes. Thanks for all the help and this should help me finish my work.
Paul
In which case Paul, please select this answer as the "accepted" answer.
Bart J
+1  A: 

Here's example from perlfaq4 (simplified a bit):

my (@intersection, %count);

for my $element (@array1, @array2) { $count{$element}++ }

for my $element (keys %count) {
    push @intersection, $element if $count{$element} > 1;        
}

More idiomatic version:

my (%union, %isect);
for my $e (@array1, @array2) { $union{$e}++ && $isect{$e}++ }

my @intersection = keys %isect;

Both methods assume that each element is unique in a given array.

eugene y
Thanks for the suggestions, however the first method didn't yield any results, and the second method produced results however they were incorrect. When i Manually inspected the output it included several dates that were not in both arrays. Thanks for your time, would happen to have any other suggestions?
Paul