views:

78

answers:

2

I've written a perl script that opens two files which contain lists. I want to find items in the first list that are not in the second list. The script uses two foreach loops. The outer loop goes through each line of the first list, extracting the necessary item information. The inner loop goes through the second list, extracting the item information, then comparing that information to the item in the first list.

So, the idea is that, for each item in the first list, the script will loop through all items in the second list, looking for matches. The trouble is that the inner foreach loop only loops once. I had this same problem in PHP when looping through MySQL tables in nested while loops. The solution was to reset the index of the mysql data using mysql_data_seek for each iteration of the outer loop. How can I do this in perl with filehandles?

+5  A: 

If your inner loop is a filehandle iterator, then you will need to reset it (say, by closing and reopening the file) every time you reach it.

foreach my $outer (@outer) {
   open INNER, '<', $inner_file;   # <--- need to add this
   while (my $inner = <INNER>) {
      ...
   }
   close INNER;                    # <--- optional with global scope filehandle
}

Alternatively, if you can spare the memory, you could copy the filehandle output to an array outside of the loop and then iterate over the array.

open INNER, '<', $inner_file;
my @INNER = <INNER>;
close INNER;

foreach my $outer (@outer) {
    foreach my $inner (@INNER) {
       ...
    }
}
mobrule
Thanks, this is a great answer, and exactly what I needed.
smfoote
@smfoote, don't just say "thanks", vote it up and check the check mark beside it. And in the future, post some code with your question.
Paul Tomblin
I don't have enough reputation to vote it up, and I did check the check mark. I understand that having code is generally useful, but I think this problem didn't require code, and the evidence is that mobrule was able to easily answer the question without seeing code. Generally when I add my code, the answers and comments are distracted from what I'm actually trying to figure out, and people start lecturing me on the quality of my code. I'm not really a fan of that.
smfoote
@smfoote, yeah, having people try to teach you something must be a real bummer.
Paul Tomblin
Well, it wasn't that easy ;-) But I concur with everyone else that when even a little code is provided, it is easier to guess what the OP needs to know (and which is not always the same as what is being asked) and increases the chance that you will get a useful answer.
mobrule
+3  A: 

It should be noted that the code as you describe it sounds very inefficient, O(n.m). You can get O(n+m) efficiency by putting the relevant contents of one file into a hash and then iterating the other file only once.

Svante
If I put the relevant contents of the first file in a hash, then iterate through the second file, I would still have to run through the hash with each iteration of the second file, I think. The code definitely could be more efficient, but neither of the files are very large, so the difference was no more than a few seconds. I should probably fix it anyway, so I don't get in bad habits.
smfoote
@smfoote: if you can design the hash so that you can compare by key, then this comparison becomes _O(1)_ with respect to the hash size instead of _O(n)_. The "trick" of a hash is that you do not have to resort to even a binary search (_O(log n)_).
Svante