views:

282

answers:

2

I see a lot of 'related' questions showing up, but none I looked at answer this specific scenario.

During a while/for loop that parses a result set generated from a SQL select statement, what is the best way to prevent the next line from being outputted if the line before it contains the same field data (whether it be the 1st field or the xth field)?

For example, if two rows were:
('EML-E','[email protected]','John','Smith')
('EML-E','[email protected]','John','Smith')

What is the best way to print only the first row based on the fact that 'EML-E' is the same in both rows?

Right now, I'm doing this:

-Storing the first field (specific to my scenario) into a 2-element array (dupecatch[1])
-Checking if dupecatch[0] = dupcatch[1] (duplicate - escape loop using 's')
-After row is processed, set dupecatch[0] = dupecatch[1]

while ($DBS->SQLFetch() == *PLibdata::RET_OK)
{
    $s=0; #s = 1 to escape out of inside loop
    while ($i != $array_len and $s==0)
    {
     $rowfetch = $DBS->{Row}->GetCharValue($array_col[$i]);
     if($i==0){$dupecatch[1] = $rowfetch;} #dupecatch prevents duplicate primary key field entries
     if($dupecatch[0] ne $dupecatch[1])
     {
      dosomething($rowfetch);
     }
     else{$s++;}
     $i++;
    }
    $i=0;
    $dupecatch[0]=$dupecatch[1];
}
+7  A: 

That is that standard way if you only care about duplicate items in a row, but $dupecatch[0] is normally named $old and $dupecatch[1] normally just the variable in question. You can tell the array is not a good fit because you only ever refer to its indices.

If you want to avoid all duplicates you can use a %seen hash:

my %seen;
while (defined (my $row = get_data())) {
    next if $seen{$row->[0]}++; #skip all but the first instance of the key
    do_stuff();
}
Chas. Owens
that makes sense... whats %seen hash all about?
CheeseConQueso
oh now i remember why i went with the array, $rowfetch changes on every iteration of the inner while loop. thats why i set the old value after the inner loop is done. I could have used 2 variables, but the array seemed better
CheeseConQueso
Arrays are only better if they are used as collections. You are only using it with its indices, therefore you are not using it as a collection. A variable named $old is more descriptive.I will put an example of a %seen hash in the answer.
Chas. Owens
+1  A: 

I suggest using DISTINCT in your SQL statement. That's probably by far the easiest fix.

Steven Devijver
Based on the requirement (collapse contiguous duplicate lines), it would be a mistake: distinct would remove all but the first of the "foo"s from ("foo", "bar", "foo", "foo") when only the last should be removed.
Chas. Owens
I am using distinct in the first two selects, if I wanted to get the distinct to work the way its supposed to, I would have to dump the second select into a temp table and then select distinct off that. I figure that too many temp tables is not in my best interest though.
CheeseConQueso