views:

203

answers:

2

I wrote a small python program to iterate over data file (*input_file*) and perform calculations. If calculation result reaches certain states (stateA or stateB), information (hits) are extracted from the results. The hits to extract depend on parameters from three parameter sets.
I used a dictionary of dictionaries to store my parameter sets (*param_sets*) and a dictionary of lists to store the hits (hits). The dictionaries *param_sets* and hits have the same keys.

The problem is,

that the lists within the hits dictionary are somehow coupled. When one list changes (by calling *extract_hits* function), the others change, too.

Here, the (shortened) code:

import os, sys, csv, pdb
from operator import itemgetter

# define three parameter sets
param_sets = {
    'A' : {'MIN_LEN' : 8, 'MAX_X' : 0, 'MAX_Z' : 0},
    'B' : {'MIN_LEN' : 8, 'MAX_X' : 1, 'MAX_Z' : 5},
    'C' : {'MIN_LEN' : 9, 'MAX_X' : 1, 'MAX_Z' : 5}}

# to store hits corresponding to each parameter set
hits = dict.fromkeys(param_sets, [])

# calculations
result = []
for input_values in input_file:
    # do some calculations
    result = do_some_calculations(result, input_values)
    if result == stateA:
        for key in param_sets.keys():
            hits[key] = extract_hits(key, result,
                                                hits[key],
                                                param_sets[key]['MIN_LEN'],
                                                param_sets[key]['MAX_X'],
                                                param_sets[key]['MAX_Z'])
        result = []  # discard results, start empty result list
    elif result == stateB:
        for key in param_sets.keys():
            local_heli[key] = extract_hits(key,
                                           result,
                                           hits[key],
                                           param_sets[key]['MIN_LEN'],
                                           param_sets[key]['MAX_X'],
                                           param_sets[key]['MAX_Z'])
        result = [] # discard results
        result = some_calculation(input_values) # start new result list
    else:
        result = some_other_calculation(result) # append result list



def extract_hits(k, seq, hits, min_len, max_au, max_gu):
    max_len = len(seq)
    for sub_seq_size in reversed(range(min_len, max_len+1)):
        for start_pos in range(0,(max_len-sub_seq_size+1)):
            from_inc = start_pos
            to_exc = start_pos + sub_seq_size
            sub_seq = seq[from_inc:to_exc]
            # complete information about helical fragment sub_seq
            helical_fragment = get_helix_data(sub_seq, max_au, max_gu)
            if helical_fragment:
                hits.append(helical_fragment)
                # search seq regions left and right from sub_seq for further hits
                left_seq = seq[0:from_inc]
                right_seq = seq[to_exc:max_len]
                if len(left_seq) >= min_len:
                    hits = sub_check_helical(left_seq, hits, min_len, max_au, max_gu)
                if len(right_seq) >= min_len:
                    hits = sub_check_helical(right_seq, hits, min_len, max_au, max_gu)
                print 'key', k                 # just for testing purpose
                print 'new', hits              # just for testing purpose
                print 'frag', helical_fragment # just for testing purpose
                pdb.set_trace()                # just for testing purpose
                return hits # appended
    return hits # unchanged

here, some output from the python debugger:

key A
new ['x', 'x', 'x', {'y': 'GGCCGGGCUUGGU'}]
frag {'y': 'GGCCGGGCUUGGU'}
> 
-> return hits
(Pdb) c
key B

new [{'y': 'GGCCGGGCUUGGU'}, {'y': 'CCGGCCCGAGCCG'}]
frag {'y': 'CCGGCCCGAGCCG'}
> extract_hits()
-> return hits
(Pdb) c
key C
new [{'y': 'GGCCGGGCUUGGU'}, {'y': 'CCGGCCCGAGCCG'}, {'y': 'CCGGCCCG'}]
frag {'y': 'CCGGCCCG'}
> extract_hits()
-> return hits

the elements from key A should not be present in key B and elements from key A and key B should not be present in key C.

+4  A: 

Dictionaries and lists are passed around by reference by default. For a dictionary, instead of:

hits_old = hits      # just for testing purpose

it would be:

hits_old = hits.copy()      # just for testing purpose

This will copy the dictionary's key/value pairings, resulting in an equivalent dictionary, that would not contain future changes to the hits dictionary.

Of course, hits_old in the second function is actually a list, not a dictionary, so you would want to do something akin to the following to copy it:

hits_old = hits[:]

I haven't a clue why lists don't also have the copy() function, in case you're wondering.

Walt W
thanks, this owrks out.Still, I wonder why the list elements from key 'A' are copied into the list stored at key 'B' within the global hits dictionary. Maybe, I am just blind for the obvious but I really can't figure out.
SimonSalman
I think it's because of how python passes objects by reference and how you're handling the keys there. Here's a link testingreflections.com/node/view/5126 and here's another one right from the horses mouth... docs.python.org/reference/datamodel.html
Thomas Schultz
An alternative to .copy or [:] is to just create a new list, passing the old list to the constructor. ie `newlist = list(other_list)` or `new_dict = dict(other_dict)`
Brian
thanks, guys! I really though python would pass by value.All problems solved - and I guess, I learned some essential stuff.
SimonSalman
+8  A: 

Your line:

hits = dict.fromkeys(param_sets, [])

is equivalent to:

hits = dict()
onelist = []
for k in param_sets:
    hits[k] = onelist

That is, every entry in hits has as its value the SAME list object, initially empty, no matter what key it has. Remember that assignment does NOT perform implicit copies: rather, it assigns "one more reference to the RHS object".

What you want is:

hits = dict()
for k in param_sets:
    hits[k] = []

that is, a NEW AND SEPARATE list object as each entry's value. Equivalently,

hits = dict((k, []) for k in param_sets)

BTW, when you do need to make a (shallow) copy of a container, the most general approach is generally to call the container's type, with the old container as the argument, as in:

newdict = dict(olddict)
newlist = list(oldlist)
newset = set(oldset)

and so forth; this also work to transform containers among types (newlist = list(oldset) makes a list out of a set, and so on).

Alex Martelli
This was the missing puzzle piece. Very good explanation!
SimonSalman
Nice catch, good answer.
hughdbrown
@Simon, always glad to help, though it does seem very weird indeed to set "community wiki" for such an exquisitely technical question -- why ever did you do THAT?!
Alex Martelli