views:

131

answers:

3

I'm iterating over a very large set of strings, which iterates over a smaller set of strings. Due to the size, this method takes a while to do, so to speed it up, I'm trying to delete the strings from the smaller set that no longer needs to be used as it goes along. Below is my current code:

    Ms::Fasta.foreach(@database) do |entry|
        all.each do |set|
            if entry.header[1..40].include? set[1] + "|"
                startVal = entry.sequence.scan_i(set[0])[0]

                if startVal != nil
                    @locations << [set[0], set[1], startVal, startVal + set[1].length]
                    all.delete(set)
                end
            end
        end
    end

The problem I face is that the easy way, array.delete(string), effectively adds a break statement to the inner loop, which messes up the results. The only way I know how to fix this is to do this:

Ms::Fasta.foreach(@database) do |entry|
        i = 0

        while i < all.length
            set = all[i]

            if entry.header[1..40].include? set[1] + "|"
                startVal = entry.sequence.scan_i(set[0])[0]

                if startVal != nil
                    @locations << [set[0], set[1], startVal, startVal + set[1].length]
                    all.delete_at(i)
                    i -= 1
                end
            end

            i += 1
        end
    end

This feels kind of sloppy to me. Is there a better way to do this? Thanks.

A: 

It is efficient, but you it can be made better. In java, the iterators actually disallow this, throwing out ConcurrentModificationException, which I find it is actually good practice (and more important when you are using threads).

The cleanest solution is to make a copy of the array you are iterating. The copy is usually very low on memory, as the objects you store on them are usually larger part of the memory footprint, unless you are storing ints and symbols on them.

Either way, it is quite common that the most efficient solution will not be cleanest or the easiest to understand. You do the math/profile and decide if it is worth it.

Daniel Ribeiro
+3  A: 

use delete_if

array.delete_if do |v|
    if v.should_be_deleted?
        true
    else
        v.update
        false
    end
end
banister
Funny if :).` array.delete_if { |v| v.should_be_deleted? }`
Yossi
A: 

Well, from what I've seen, it doesn't look like there's any better way. Besides, I found it faster to use a hash.

Jesse J