tags:

views:

70

answers:

4

Okay I have 2 files. One file is data that is updated every 10 minutes while the second is data that was previously used. What I am trying to do is take one line from the new file and loop through each line of the second file and see if it matches one. If it does I dont want to use it, but if there is no match than I want to add it to a string. In what I have done so far it seems that the check does not ever find a match even though there is one. Here is what I have and a sample of the data I have been using from both files. CHECKHAIL and USEDHAIL are the two files

while(my $toBeChecked = <CHECKHAIL>){
        my $found = 0;
        seek USEDHAIL, 0, 0 or die "$0: seek: $!";
        while(my $hailCheck = <USEDHAIL>){
            if( $toBeChecked == $hailCheck){
                $found += 1;
            }
        }
        print USEDHAIL $toBeChecked;
        if ($found == 0){
            $toEmail .= $toBeChecked;
        }
    }
    print $toEmail;
    return;
}

CHECKHAIL sample data

2226  175   2 NE      LAWRENCE           DEADWOOD         SD    44.4    -103.7  (UNR)

2305  200   2 S       SISKIYOU           GREENVIEW        CA    41.52   -122.9  2 INCH HAIL REPORTED WITH STORM JUST SOUTH OF GREENVIEW. (MFR)

2350  200             DANIELS            E FLAXVILLE      MT    48.8    -105.17 GOLF BALL TO HEN EGG SIZED HAIL (GGW)

2350  175   5 N       DANIELS            RICHLAND         MT    48.89   -106.05 DESTROYED CROPS (GGW)

USEDHAIL sample data

2226  175   2 NE      LAWRENCE           DEADWOOD         SD    44.4    -103.7  (UNR)

2305  200   2 S       SISKIYOU           GREENVIEW        CA    41.52   -122.9  2 INCH HAIL REPORTED WITH STORM JUST SOUTH OF GREENVIEW. (MFR)
+1  A: 

Using $_ within an inner loop can cause problems. Try naming your lines first like so:

while(my $toBeChecked = <CHECKHAIL>){
    my $found = 0;
    while( my $hailCheck = <USEDHAIL>){

Also perl sees numeric comparison and string comparison differently. You're using string comparison instead of numeric comparison:

 if ($found eq 0){

Change to:

 if ($found == 0){
Cfreak
That is very helpful to know but I am still getting the same problem
shinjuo
see the edit I just posted
Cfreak
I didnt know that either but still not working
shinjuo
Also, are you certain the lines are exactly the same? White space differences would also cause it to not match
Cfreak
It should I just copied and pasted from one file to the other
shinjuo
if its a special white space, tab for one, that doesn't always copy correctly. copy from one file into two new files and use those for check and use.
coffeepac
+1  A: 

This line sticks out for me:

if ($found eq 0){

Since $found is a boolean, perform boolean tests on it:

if (not $found) {

It also looks like your logic is a bit reversed -- in the first if, you return if the lines do not match, and then in the second if, you return if there was a match. Do you perhaps intend to say next; to skip out of the innermost loop, instead?

Ether
Still not working. Did you mean to do if($found){and then put the concatination in the }else{ part? Thats what I tried
shinjuo
I dont even need the first else actually, I just want to test if it is a match.
shinjuo
+3  A: 

It never has an opportunity to succeed because of

while(<USEDHAIL>){
    my $hailCheck = $_;
    if( $toBeChecked eq $hailCheck){
        $found += 1;
    }else{
        return;  ### XXX
    }
}

On the first mismatch, the sub returns to its caller. You may have meant next instead, but for conciseness, you should remove the whole else clause. Remove the other else { return; } (corresponding to when $found is true) for the same reason.

Note that your algorithm has quadratic complexity and will be slow for large inputs. It'd be better to read the used records into a hash and then for each line of CHECKHAIL probe the %used hash to see whether it's been processed.

With those lines removed, I get

$ ./prog.pl 

2305  200   2 S       SISKIYOU           GREENVIEW        CA    41.52   -122.9  2 INCH HAIL REPORTED WITH STORM JUST SOUTH OF GREENVIEW. (MFR)

2350  200             DANIELS            E FLAXVILLE      MT    48.8    -105.17 GOLF BALL TO HEN EGG SIZED HAIL (GGW)

2350  175   5 N       DANIELS            RICHLAND         MT    48.89   -106.05 DESTROYED CROPS (GGW)

As you can see, that still has a bug. You need to rewind USEDHAIL for each line of CHECKHAIL:

seek USEDHAIL, 0, 0 or die "$0: seek: $!";
while(<USEDHAIL>){
...

This produces

$ ./prog.pl 
2350  200             DANIELS            E FLAXVILLE      MT    48.8    -105.17 GOLF BALL TO HEN EGG SIZED HAIL (GGW)
2350  175   5 N       DANIELS            RICHLAND         MT    48.89   -106.05 DESTROYED CROPS (GGW)

For an example of a better way to do it, consider

#! /usr/bin/perl

use warnings;
use strict;

sub read_used_hail {
  my($path) = @_;

  my %used;

  open my $fh, "<", $path or die "$0: open $path: $!";

  local $" = " ";  # " fix Stack Overflow highlighting
  while (<$fh>) {
    chomp;
    my @f = split " ", $_, 10;
    next unless @f;
    ++$used{"@f"};
  }

  wantarray ? %used : \%used;
}

my %used = read_used_hail "used-hail";
open my $check, "<", "check-hail" or die "$0: open: $!";

while (<$check>) {
  chomp;
  my @f = split " ", $_, 10;
  next if !@f || $used{join " " => @f};
  print $_, "\n";
}

Sample run:

$ ./prog.pl 
2350  200             DANIELS            E FLAXVILLE      MT    48.8    -105.17 GOLF BALL TO HEN EGG SIZED HAIL (GGW)
2350  175   5 N       DANIELS            RICHLAND         MT    48.89   -106.05 DESTROYED CROPS (GGW)
Greg Bacon
Where would I put my file paths at in this. I am pretty new to perl if you cant tell and some of this is stuff I have never used.
shinjuo
The code above hardcodes the unimaginative names `check-hail` and `used-hail` for your two input files.
Greg Bacon
The edited code I used above is what I tried and it did not work still. I am now going to try to work with what you just gave me and see where that goes.
shinjuo
On a side note what is wrong with the code i posted above. It looks like the code you used and yet it still doesnt work?
shinjuo
Your updated code should have `if( $toBeChecked eq $hailCheck){` rather than `==` because you're comparing strings. To be able to append records to the end of the used file, make sure to open it in read-and-append mode, e.g., `open USEDHAIL, ">>+", $path`, and don't forget to `seek` to the beginning before all reads.
Greg Bacon
+2  A: 

Why wouldn't you just create a hash for the first (used) file?

use strict; 
use warnings;
my %fromUsedFile;
open USEDFILE, '<', '/the/data/file/that/is/10minutesold';
$fromUsedFile{$_}++  while <USEDFILE>;
close USEDFILE;

while ($toBeChecked = <CHECKHAIL>) {
    if (defined $fromUsedFile{$toBeChecked}) {
        # ... line is in both the new and old file
    } else {
        # ... line is only in the new file
        $toBeEmailed .= $toBeChecked;
    }
}
mobrule
Because I dont know how to create a hash. I will try this though
shinjuo
this is giving me syntax errors for $usedFileName and $fromUSedFile. What declaration am I supposed to make with those
shinjuo
@shinjuo - answer updated to work more out of the box with `strict` enabled.
mobrule
This worked very well. I appreciate the help. I am still reading the learning perl O'Reilly Books so I will hopefully get to hashs soon
shinjuo